Commit graph

820 commits

Author SHA1 Message Date
Sigbjørn Skjæret
dbcf64cf07
feat: Support SPM infill (#1492)
* Support SPM infill

* typo--

* one less layer of parenthesis necessary

* new required internals

* manually add bos/eos if model requires it

* add bos even when unknown

This is identical behaviour to llama.cpp

I guess any model that doesn't use BOS is recent enough to have the add_bos_token metadata.

* don't add bos/eos on non-infill pre-tokenized prompt

* add tokenizer hack to remove leading space in suffix

* I keep forgetting metadata are strings

* check if bos exists

* add example

* add cls/sep instead of bos/eos for WPM vocab

* simplify

* color-code filtered suffix

---------

Co-authored-by: Andrei Betlen <abetlen@gmail.com>
2024-06-13 03:45:24 -04:00
Andrei Betlen
86a38ad4a0 chore: Bump version 2024-06-10 11:14:33 -04:00
Andrei Betlen
255e1b4495 feat: Update llama.cpp 2024-06-07 02:02:12 -04:00
nullname
d634efcdd9
feat: adding rpc_servers parameter to Llama class (#1477)
* passthru rpc_servers params

wip

* enable llama rpc by default

* convert string to byte

* add rpc package

* Revert "enable llama rpc by default"

This reverts commit 832c6dd56c979514cec5df224bf2d2014dccd790.

* update readme

* Only set rpc_servers when provided

* Add rpc servers to server options

---------

Co-authored-by: Andrei Betlen <abetlen@gmail.com>
2024-06-04 10:38:21 -04:00
Asghar Ghorbani
6e0642ca19
fix: fix logprobs when BOS is not present (#1471)
* Fix lobprobs when BOS is not present

* Fix logprobs when bos is not available
2024-06-04 10:18:38 -04:00
Sigbjørn Skjæret
027f7bc678
fix: Avoid duplicate special tokens in chat formats (#1439)
* Templates sometimes have BOS in them, remove duplicate

* tokenize chat format prompts before completion

This is to ensure that we don't duplicate any special tokens.

Hopefully I amended the existing formats correctly?

* updated comment

* corrected a few

* add some missing internals

* proper bos/eos detection

* just let tokenizer do the job

* typo--

* align test with new response

* changed to a warning

* move to another PR

* Use python warnings module

---------

Co-authored-by: Andrei Betlen <abetlen@gmail.com>
2024-06-04 10:15:41 -04:00
Andrei Betlen
c3ef41ba06 chore: Bump version 2024-06-04 00:49:24 -04:00
Andrei Betlen
cd3f1bb387 feat: Update llama.cpp 2024-06-04 00:35:47 -04:00
Andrei Betlen
6b018e00b1 misc: Improve llava error messages 2024-06-03 11:19:10 -04:00
Andrei Betlen
a6457ba74b Merge branch 'main' of https://github.com/abetlen/llama-cpp-python into main 2024-06-01 18:10:13 -04:00
Andrei Betlen
af3ed503e9 fix: Use numpy recarray for candidates data, fixes bug with temp < 0 2024-06-01 18:09:24 -04:00
Andrei Betlen
165b4dc6c1 fix: Fix typo in Llama3VisionAlphaChatHandler. Closes #1488 2024-05-29 02:29:44 -04:00
Andrei Betlen
91d05aba46 fix: adjust kv_override member names to match llama.cpp 2024-05-29 02:28:58 -04:00
Andrei Betlen
df45a4b3fe fix: fix string value kv_overrides. Closes #1487 2024-05-29 02:02:22 -04:00
Andrei Betlen
454c9bb1cb feat: Update llama.cpp 2024-05-27 10:51:57 -04:00
Andrei Betlen
a4c9ab885d chore: Bump version 2024-05-24 01:59:25 -04:00
Linghan Zhong
5cae1040e3
feat: Improve Llama.eval performance by avoiding list conversion (#1476)
Co-authored-by: Andrei <abetlen@gmail.com>
2024-05-24 01:49:44 -04:00
Andrei Betlen
087cc0b036 feat: Update llama.cpp 2024-05-24 01:43:36 -04:00
Andrei Betlen
5a595f035a feat: Update llama.cpp 2024-05-22 02:40:31 -04:00
Andrei Betlen
b564d05806 chore: Bump version 2024-05-16 00:41:21 -04:00
Andrei Betlen
d99a6ba607 fix: segfault for models without eos / bos tokens. Closes #1463 2024-05-16 00:37:27 -04:00
twaka
5212fb08ae
feat: add MinTokensLogitProcessor and min_tokens argument to server (#1333)
* implement min_tokens

* set default to 0

* pass min_tokens

* fix

* remove copy

* implement MinTokensLogitsProcessor

* format

* fix condition
2024-05-14 09:50:53 -04:00
Sigbjørn Skjæret
389e09c2f5
misc: Remove unnecessary metadata lookups (#1448)
Special tokens are already mapped from metadata by llama.cpp
2024-05-14 09:44:09 -04:00
Andrei Betlen
50f5c74ecf Update llama.cpp 2024-05-14 09:30:04 -04:00
Andrei Betlen
3c19faa0d4 chore: Bump version 2024-05-12 10:32:52 -04:00
Andrei Betlen
73165021bb chore: Bump version 2024-05-10 09:44:18 -04:00
Andrei Betlen
ac55d0a175 fix: Clear kv cache to avoid kv bug when image is evaluated first 2024-05-10 02:38:10 -04:00
Andrei Betlen
4badac3a60 chore: Bump version 2024-05-10 00:56:19 -04:00
Sigbjørn Skjæret
561e880654
fix(security): Render all jinja templates in immutable sandbox (#1441)
Chat templates are rendered with ImmutableSandboxedEnvironment in transformers so no need to do otherwise here.

Co-authored-by: Andrei <abetlen@gmail.com>
2024-05-10 00:49:40 -04:00
Patrick Peng
b454f40a9a
Merge pull request from GHSA-56xg-wfcc-g829
Co-authored-by: Andrei <abetlen@gmail.com>
2024-05-10 00:47:56 -04:00
Sigbjørn Skjæret
5ab40e6167
feat: Support multiple chat templates - step 1 (#1396)
* Support multiple chat templates - step 1

As a first step, allow user to to select template from metadata with chat_format parameter in the form of `chat_template.name`.

* register chat templates to self.chat_formats instead of globally

* Don't expose internal chat handlers yet

---------

Co-authored-by: Andrei <abetlen@gmail.com>
2024-05-09 09:49:09 -04:00
Andrei Betlen
bf66a283e8 chore: Bump version 2024-05-09 03:02:52 -04:00
Andrei Betlen
3757328b70 fix: free last image embed in llava chat handler 2024-05-08 22:16:18 -04:00
Andrei Betlen
77122638b4 fix: Make leading bos_token optional for image chat formats, fix nanollava system message 2024-05-08 13:12:31 -04:00
Andrei Betlen
2a39b99575 feat: Update llama.cpp 2024-05-08 08:42:22 -04:00
Andrei Betlen
9ce5cb376a chore: Bump version 2024-05-08 02:36:42 -04:00
Sigbjørn Skjæret
4a7122d22f
feat: fill-in-middle support (#1386)
* Proper fill-in-middle support

Use prefix/middle/suffix tokens when metadata is present in GGUF, like f.ex. in [this](https://huggingface.co/CISCai/CodeQwen1.5-7B-Chat-SOTA-GGUF) one.

* fall back to internal prefix/middle/suffix id

In some cases llama.cpp will make a guess at fim tokens, use them if there's no metadata.

* typo--

* don't insert special tokens that are not there in suffix

Note: add_bos is misnamed, it's actually add_special and can cause several special tokens to be added to the token list (the special parameter is actually parse_special).

* don't add/parse any special tokens when using fim

I've left original behavior when no fim tokens are found, but this should perhaps be re-evaluated.

* don't append suffix to prompt_tokens unless fim tokens are detected

* make sure we only do this for fim

---------

Co-authored-by: Andrei <abetlen@gmail.com>
2024-05-08 02:26:22 -04:00
Andrei Betlen
228949c1f7 feat: Update llama.cpp 2024-05-08 02:22:15 -04:00
Sarunas Kalade
903b28adf5
fix: adding missing args in create_completion for functionary chat handler (#1430) 2024-05-08 02:21:27 -04:00
Bruno Alvisio
a50d24e3a7
fix: chat_format log where auto-detected format prints None (#1434) 2024-05-08 02:19:35 -04:00
Andrei Betlen
0318702cdc feat(server): Add support for setting root_path. Closes #1420 2024-05-05 12:49:31 -04:00
Andrei Betlen
3e2597eac8 feat: Update llama.cpp 2024-05-05 12:12:27 -04:00
Noam Gat
e0d7674e62
fix: detokenization case where first token does not start with a leading space (#1375)
* Fix tokenization edge case where llama output does not start with a space

See this notebook:
https://colab.research.google.com/drive/1Ooz11nFPk19zyJdMDx42CeesU8aWZMdI#scrollTo=oKpHw5PZ30uC

* Update _internals.py

Fixing to compare to b' ' instead of (str)' '

---------

Co-authored-by: Andrei <abetlen@gmail.com>
2024-05-04 10:14:59 -04:00
Jeffrey Fong
1f56c648c3
feat: Implement streaming for Functionary v2 + Bug fixes (#1419)
* set up streaming for v2

* assert v2 streaming, fix tool_call vs function_call

* fix streaming with tool_choice/function_call

* make functions return 1 function call only when 'auto'

* fix

---------

Co-authored-by: Andrei <abetlen@gmail.com>
2024-05-04 10:11:20 -04:00
Andrei Betlen
f9b7221c8f Merge branch 'main' of github.com:abetlen/llama_cpp_python into main 2024-05-03 19:07:54 -04:00
Andrei Betlen
9f7a85571a fix: Use memmove to copy str_value kv_override. Closes #1417 2024-05-03 19:07:50 -04:00
Andrei Betlen
0a454bebe6 feat(server): Remove temperature bounds checks for server. Closes #1384 2024-05-03 15:23:06 -04:00
Daniel Thuerck
2138561fab
fix(server): Propagate flash_attn to model load. (#1424) 2024-05-03 12:17:07 -04:00
Andrei Betlen
2117122396 chore: Bump version 2024-05-02 12:07:09 -04:00
Andrei Betlen
31b1d95a6c feat: Add llama-3-vision-alpha chat format 2024-05-02 11:32:18 -04:00