Andrei Betlen
cd3f1bb387
feat: Update llama.cpp
2024-06-04 00:35:47 -04:00
Andrei Betlen
6b018e00b1
misc: Improve llava error messages
2024-06-03 11:19:10 -04:00
Andrei Betlen
a6457ba74b
Merge branch 'main' of https://github.com/abetlen/llama-cpp-python into main
2024-06-01 18:10:13 -04:00
Andrei Betlen
af3ed503e9
fix: Use numpy recarray for candidates data, fixes bug with temp < 0
2024-06-01 18:09:24 -04:00
Andrei Betlen
165b4dc6c1
fix: Fix typo in Llama3VisionAlphaChatHandler. Closes #1488
2024-05-29 02:29:44 -04:00
Andrei Betlen
91d05aba46
fix: adjust kv_override member names to match llama.cpp
2024-05-29 02:28:58 -04:00
Andrei Betlen
df45a4b3fe
fix: fix string value kv_overrides. Closes #1487
2024-05-29 02:02:22 -04:00
Andrei Betlen
10b7c50cd2
Merge branch 'main' of https://github.com/abetlen/llama-cpp-python into main
2024-05-28 22:52:30 -04:00
Andrei Betlen
2907c26906
misc: Update debug build to keep all debug symbols for easier gdb debugging
2024-05-28 22:52:28 -04:00
Andrei Betlen
c26004b1be
feat: Update llama.cpp
2024-05-28 22:52:03 -04:00
dependabot[bot]
c564007ff6
chore(deps): bump pypa/cibuildwheel from 2.18.0 to 2.18.1 ( #1472 )
...
updated-dependencies:
- dependency-name: pypa/cibuildwheel
dependency-type: direct:production
update-type: version-update:semver-patch
...
Signed-off-by: dependabot[bot] <support@github.com>
Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
Co-authored-by: Andrei <abetlen@gmail.com>
2024-05-27 10:57:17 -04:00
Andrei Betlen
454c9bb1cb
feat: Update llama.cpp
2024-05-27 10:51:57 -04:00
Andrei Betlen
2d89964147
docs: Fix table formatting
2024-05-24 11:55:41 -04:00
Andrei Betlen
9e8d7d55bd
fix(docs): Fix link typo
2024-05-24 11:55:01 -04:00
Andrei Betlen
ec43e8920f
docs: Update multi-modal model section
2024-05-24 11:54:15 -04:00
Andrei Betlen
a4c9ab885d
chore: Bump version
2024-05-24 01:59:25 -04:00
Linghan Zhong
5cae1040e3
feat: Improve Llama.eval performance by avoiding list conversion ( #1476 )
...
Co-authored-by: Andrei <abetlen@gmail.com>
2024-05-24 01:49:44 -04:00
Andrei Betlen
087cc0b036
feat: Update llama.cpp
2024-05-24 01:43:36 -04:00
Andrei Betlen
5a595f035a
feat: Update llama.cpp
2024-05-22 02:40:31 -04:00
Andrei Betlen
3dbfec74e7
Merge branch 'main' of https://github.com/abetlen/llama-cpp-python into main
2024-05-18 01:19:20 -04:00
Andrei Betlen
d8a3b013c3
feat: Update llama.cpp
2024-05-18 01:19:19 -04:00
Radoslav Gerganov
03f171e810
example: LLM inference with Ray Serve ( #1465 )
2024-05-17 13:27:26 -04:00
Andrei Betlen
b564d05806
chore: Bump version
2024-05-16 00:41:21 -04:00
Andrei Betlen
d99a6ba607
fix: segfault for models without eos / bos tokens. Closes #1463
2024-05-16 00:37:27 -04:00
Andrei Betlen
e811a81066
Merge branch 'main' of https://github.com/abetlen/llama-cpp-python into main
2024-05-15 23:59:18 -04:00
Andrei Betlen
ca8e3c967d
feat: Update llama.cpp
2024-05-15 23:59:17 -04:00
twaka
5212fb08ae
feat: add MinTokensLogitProcessor and min_tokens argument to server ( #1333 )
...
* implement min_tokens
* set default to 0
* pass min_tokens
* fix
* remove copy
* implement MinTokensLogitsProcessor
* format
* fix condition
2024-05-14 09:50:53 -04:00
Sigbjørn Skjæret
389e09c2f5
misc: Remove unnecessary metadata lookups ( #1448 )
...
Special tokens are already mapped from metadata by llama.cpp
2024-05-14 09:44:09 -04:00
dependabot[bot]
4b54f79330
chore(deps): bump pypa/cibuildwheel from 2.17.0 to 2.18.0 ( #1453 )
...
Bumps [pypa/cibuildwheel](https://github.com/pypa/cibuildwheel ) from 2.17.0 to 2.18.0.
- [Release notes](https://github.com/pypa/cibuildwheel/releases )
- [Changelog](https://github.com/pypa/cibuildwheel/blob/main/docs/changelog.md )
- [Commits](https://github.com/pypa/cibuildwheel/compare/v2.17.0...v2.18.0 )
---
updated-dependencies:
- dependency-name: pypa/cibuildwheel
dependency-type: direct:production
update-type: version-update:semver-minor
...
Signed-off-by: dependabot[bot] <support@github.com>
Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
2024-05-14 09:35:52 -04:00
Andrei Betlen
50f5c74ecf
Update llama.cpp
2024-05-14 09:30:04 -04:00
Andrei Betlen
43ba1526c8
feat: Update llama.cpp
2024-05-13 09:39:08 -04:00
Andrei Betlen
3f8e17af63
fix(ci): Use version without extra platform tag in pep503 index
2024-05-12 11:45:55 -04:00
Andrei Betlen
3c19faa0d4
chore: Bump version
2024-05-12 10:32:52 -04:00
Andrei Betlen
3fe8e9a8f3
Merge branch 'main' of https://github.com/abetlen/llama-cpp-python into main
2024-05-12 10:30:24 -04:00
Andrei Betlen
9dc5e20fb6
feat: Update llama.cpp
2024-05-12 10:30:23 -04:00
Peng Yu
1547202b77
docs: Fix typo in README.md ( #1444 )
2024-05-10 10:35:51 -04:00
Andrei Betlen
7f59856fa6
fix: Enable CUDA backend for llava. Closes #1324
2024-05-10 10:18:47 -04:00
Andrei Betlen
73165021bb
chore: Bump version
2024-05-10 09:44:18 -04:00
Andrei Betlen
eafb6ec5e8
feat: Update llama.cpp
2024-05-10 08:39:55 -04:00
Andrei Betlen
ac55d0a175
fix: Clear kv cache to avoid kv bug when image is evaluated first
2024-05-10 02:38:10 -04:00
Andrei Betlen
4badac3a60
chore: Bump version
2024-05-10 00:56:19 -04:00
Sigbjørn Skjæret
561e880654
fix(security): Render all jinja templates in immutable sandbox ( #1441 )
...
Chat templates are rendered with ImmutableSandboxedEnvironment in transformers so no need to do otherwise here.
Co-authored-by: Andrei <abetlen@gmail.com>
2024-05-10 00:49:40 -04:00
Patrick Peng
b454f40a9a
Merge pull request from GHSA-56xg-wfcc-g829
...
Co-authored-by: Andrei <abetlen@gmail.com>
2024-05-10 00:47:56 -04:00
Sigbjørn Skjæret
5ab40e6167
feat: Support multiple chat templates - step 1 ( #1396 )
...
* Support multiple chat templates - step 1
As a first step, allow user to to select template from metadata with chat_format parameter in the form of `chat_template.name`.
* register chat templates to self.chat_formats instead of globally
* Don't expose internal chat handlers yet
---------
Co-authored-by: Andrei <abetlen@gmail.com>
2024-05-09 09:49:09 -04:00
Andrei Betlen
bf66a283e8
chore: Bump version
2024-05-09 03:02:52 -04:00
Andrei Betlen
3757328b70
fix: free last image embed in llava chat handler
2024-05-08 22:16:18 -04:00
Andrei Betlen
77122638b4
fix: Make leading bos_token optional for image chat formats, fix nanollava system message
2024-05-08 13:12:31 -04:00
Andrei Betlen
2a39b99575
feat: Update llama.cpp
2024-05-08 08:42:22 -04:00
Andrei Betlen
9ce5cb376a
chore: Bump version
2024-05-08 02:36:42 -04:00
Sigbjørn Skjæret
4a7122d22f
feat: fill-in-middle support ( #1386 )
...
* Proper fill-in-middle support
Use prefix/middle/suffix tokens when metadata is present in GGUF, like f.ex. in [this](https://huggingface.co/CISCai/CodeQwen1.5-7B-Chat-SOTA-GGUF ) one.
* fall back to internal prefix/middle/suffix id
In some cases llama.cpp will make a guess at fim tokens, use them if there's no metadata.
* typo--
* don't insert special tokens that are not there in suffix
Note: add_bos is misnamed, it's actually add_special and can cause several special tokens to be added to the token list (the special parameter is actually parse_special).
* don't add/parse any special tokens when using fim
I've left original behavior when no fim tokens are found, but this should perhaps be re-evaluated.
* don't append suffix to prompt_tokens unless fim tokens are detected
* make sure we only do this for fim
---------
Co-authored-by: Andrei <abetlen@gmail.com>
2024-05-08 02:26:22 -04:00