Andrei Betlen
3c19faa0d4
chore: Bump version
2024-05-12 10:32:52 -04:00
Andrei Betlen
3fe8e9a8f3
Merge branch 'main' of https://github.com/abetlen/llama-cpp-python into main
2024-05-12 10:30:24 -04:00
Andrei Betlen
9dc5e20fb6
feat: Update llama.cpp
2024-05-12 10:30:23 -04:00
Peng Yu
1547202b77
docs: Fix typo in README.md ( #1444 )
2024-05-10 10:35:51 -04:00
Andrei Betlen
7f59856fa6
fix: Enable CUDA backend for llava. Closes #1324
2024-05-10 10:18:47 -04:00
Andrei Betlen
73165021bb
chore: Bump version
2024-05-10 09:44:18 -04:00
Andrei Betlen
eafb6ec5e8
feat: Update llama.cpp
2024-05-10 08:39:55 -04:00
Andrei Betlen
ac55d0a175
fix: Clear kv cache to avoid kv bug when image is evaluated first
2024-05-10 02:38:10 -04:00
Andrei Betlen
4badac3a60
chore: Bump version
2024-05-10 00:56:19 -04:00
Sigbjørn Skjæret
561e880654
fix(security): Render all jinja templates in immutable sandbox ( #1441 )
...
Chat templates are rendered with ImmutableSandboxedEnvironment in transformers so no need to do otherwise here.
Co-authored-by: Andrei <abetlen@gmail.com>
2024-05-10 00:49:40 -04:00
Patrick Peng
b454f40a9a
Merge pull request from GHSA-56xg-wfcc-g829
...
Co-authored-by: Andrei <abetlen@gmail.com>
2024-05-10 00:47:56 -04:00
Sigbjørn Skjæret
5ab40e6167
feat: Support multiple chat templates - step 1 ( #1396 )
...
* Support multiple chat templates - step 1
As a first step, allow user to to select template from metadata with chat_format parameter in the form of `chat_template.name`.
* register chat templates to self.chat_formats instead of globally
* Don't expose internal chat handlers yet
---------
Co-authored-by: Andrei <abetlen@gmail.com>
2024-05-09 09:49:09 -04:00
Andrei Betlen
bf66a283e8
chore: Bump version
2024-05-09 03:02:52 -04:00
Andrei Betlen
3757328b70
fix: free last image embed in llava chat handler
2024-05-08 22:16:18 -04:00
Andrei Betlen
77122638b4
fix: Make leading bos_token optional for image chat formats, fix nanollava system message
2024-05-08 13:12:31 -04:00
Andrei Betlen
2a39b99575
feat: Update llama.cpp
2024-05-08 08:42:22 -04:00
Andrei Betlen
9ce5cb376a
chore: Bump version
2024-05-08 02:36:42 -04:00
Sigbjørn Skjæret
4a7122d22f
feat: fill-in-middle support ( #1386 )
...
* Proper fill-in-middle support
Use prefix/middle/suffix tokens when metadata is present in GGUF, like f.ex. in [this](https://huggingface.co/CISCai/CodeQwen1.5-7B-Chat-SOTA-GGUF ) one.
* fall back to internal prefix/middle/suffix id
In some cases llama.cpp will make a guess at fim tokens, use them if there's no metadata.
* typo--
* don't insert special tokens that are not there in suffix
Note: add_bos is misnamed, it's actually add_special and can cause several special tokens to be added to the token list (the special parameter is actually parse_special).
* don't add/parse any special tokens when using fim
I've left original behavior when no fim tokens are found, but this should perhaps be re-evaluated.
* don't append suffix to prompt_tokens unless fim tokens are detected
* make sure we only do this for fim
---------
Co-authored-by: Andrei <abetlen@gmail.com>
2024-05-08 02:26:22 -04:00
Andrei Betlen
228949c1f7
feat: Update llama.cpp
2024-05-08 02:22:15 -04:00
Sarunas Kalade
903b28adf5
fix: adding missing args in create_completion for functionary chat handler ( #1430 )
2024-05-08 02:21:27 -04:00
Ikko Eltociear Ashimine
07966b9ba7
docs: update README.md ( #1432 )
...
accomodate -> accommodate
2024-05-08 02:20:20 -04:00
Bruno Alvisio
a50d24e3a7
fix: chat_format log where auto-detected format prints None
( #1434 )
2024-05-08 02:19:35 -04:00
Andrei Betlen
0318702cdc
feat(server): Add support for setting root_path. Closes #1420
2024-05-05 12:49:31 -04:00
Olivier DEBAUCHE
3666833107
feat(ci): Add docker checks and check deps more frequently ( #1426 )
...
* Update dependabot.yml
Add github-actions update
* Update dependabot.yml
* Update dependabot.yml
2024-05-05 12:42:28 -04:00
Andrei Betlen
3e2597eac8
feat: Update llama.cpp
2024-05-05 12:12:27 -04:00
Noam Gat
e0d7674e62
fix: detokenization case where first token does not start with a leading space ( #1375 )
...
* Fix tokenization edge case where llama output does not start with a space
See this notebook:
https://colab.research.google.com/drive/1Ooz11nFPk19zyJdMDx42CeesU8aWZMdI#scrollTo=oKpHw5PZ30uC
* Update _internals.py
Fixing to compare to b' ' instead of (str)' '
---------
Co-authored-by: Andrei <abetlen@gmail.com>
2024-05-04 10:14:59 -04:00
Jeffrey Fong
1f56c648c3
feat: Implement streaming for Functionary v2 + Bug fixes ( #1419 )
...
* set up streaming for v2
* assert v2 streaming, fix tool_call vs function_call
* fix streaming with tool_choice/function_call
* make functions return 1 function call only when 'auto'
* fix
---------
Co-authored-by: Andrei <abetlen@gmail.com>
2024-05-04 10:11:20 -04:00
Andrei Betlen
f9b7221c8f
Merge branch 'main' of github.com:abetlen/llama_cpp_python into main
2024-05-03 19:07:54 -04:00
Andrei Betlen
9f7a85571a
fix: Use memmove to copy str_value kv_override. Closes #1417
2024-05-03 19:07:50 -04:00
Andrei Betlen
0a454bebe6
feat(server): Remove temperature bounds checks for server. Closes #1384
2024-05-03 15:23:06 -04:00
Daniel Thuerck
2138561fab
fix(server): Propagate flash_attn
to model load. ( #1424 )
2024-05-03 12:17:07 -04:00
Andrei Betlen
2117122396
chore: Bump version
2024-05-02 12:07:09 -04:00
Andrei Betlen
d75dea18db
feat: Update llama.cpp
2024-05-02 12:00:44 -04:00
Andrei Betlen
31b1d95a6c
feat: Add llama-3-vision-alpha chat format
2024-05-02 11:32:18 -04:00
Andrei Betlen
4f01c452b6
fix: Change default verbose value of verbose in image chat format handlers to True to match Llama
2024-04-30 15:50:30 -04:00
Andrei Betlen
946156fb6c
feat: Update llama.cpp
2024-04-30 15:46:45 -04:00
Andrei Betlen
9286b5caac
Merge branch 'main' of github.com:abetlen/llama_cpp_python into main
2024-04-30 15:45:36 -04:00
Andrei Betlen
f116175a5a
fix: Suppress all logs when verbose=False, use hardcoded fileno's to work in colab notebooks. Closes #796 Closes #729
2024-04-30 15:45:34 -04:00
Jonathan Soma
3226b3c5ef
fix: UTF-8 handling with grammars ( #1415 )
...
Use Python's built-in UTF-8 handling to get code points
2024-04-30 14:33:23 -04:00
Andrei Betlen
945c62c567
docs: Change all examples from interpreter style to script style.
2024-04-30 10:15:04 -04:00
Andrei Betlen
26478ab293
docs: Update README.md
2024-04-30 10:11:38 -04:00
Andrei Betlen
b14dd98922
chore: Bump version
2024-04-30 09:39:56 -04:00
Andrei Betlen
29b6e9a5c8
fix: wrong parameter for flash attention in pickle __getstate__
2024-04-30 09:32:47 -04:00
Andrei Betlen
22d77eefd2
feat: Add option to enable flash_attn
to Lllama params and ModelSettings
2024-04-30 09:29:16 -04:00
Andrei Betlen
8c2b24d5aa
feat: Update llama.cpp
2024-04-30 09:27:55 -04:00
Olivier DEBAUCHE
6332527a69
fix(ci): Fix build-and-release.yaml ( #1413 )
...
* Update build-and-release.yaml
* Update build-and-release.yaml
2024-04-30 09:16:14 -04:00
Andrei Betlen
c8cd8c17c6
docs: Update README to include CUDA 12.4 wheels
2024-04-30 03:12:46 -04:00
Andrei Betlen
f417cce28a
chore: Bump version
2024-04-30 03:11:02 -04:00
Andrei Betlen
3489ef09d3
fix: Ensure image renders before text in chat formats regardless of message content order.
2024-04-30 03:08:46 -04:00
Andrei Betlen
d03f15bb73
fix(ci): Fix bug in use of upload-artifact failing to merge multiple artifacts into a single release.
2024-04-30 02:58:55 -04:00