baalajimaestro/llama.cpp

Author	SHA1	Message	Date
Andrei Betlen	454c9bb1cb	feat: Update llama.cpp	2024-05-27 10:51:57 -04:00
Andrei Betlen	a4c9ab885d	chore: Bump version	2024-05-24 01:59:25 -04:00
Linghan Zhong	5cae1040e3	feat: Improve Llama.eval performance by avoiding list conversion (#1476 ) Co-authored-by: Andrei <abetlen@gmail.com>	2024-05-24 01:49:44 -04:00
Andrei Betlen	087cc0b036	feat: Update llama.cpp	2024-05-24 01:43:36 -04:00
Andrei Betlen	5a595f035a	feat: Update llama.cpp	2024-05-22 02:40:31 -04:00
Andrei Betlen	b564d05806	chore: Bump version	2024-05-16 00:41:21 -04:00
Andrei Betlen	d99a6ba607	fix: segfault for models without eos / bos tokens. Closes #1463	2024-05-16 00:37:27 -04:00
twaka	5212fb08ae	feat: add MinTokensLogitProcessor and min_tokens argument to server (#1333 ) * implement min_tokens * set default to 0 * pass min_tokens * fix * remove copy * implement MinTokensLogitsProcessor * format * fix condition	2024-05-14 09:50:53 -04:00
Sigbjørn Skjæret	389e09c2f5	misc: Remove unnecessary metadata lookups (#1448 ) Special tokens are already mapped from metadata by llama.cpp	2024-05-14 09:44:09 -04:00
Andrei Betlen	50f5c74ecf	Update llama.cpp	2024-05-14 09:30:04 -04:00
Andrei Betlen	3c19faa0d4	chore: Bump version	2024-05-12 10:32:52 -04:00
Andrei Betlen	73165021bb	chore: Bump version	2024-05-10 09:44:18 -04:00
Andrei Betlen	ac55d0a175	fix: Clear kv cache to avoid kv bug when image is evaluated first	2024-05-10 02:38:10 -04:00
Andrei Betlen	4badac3a60	chore: Bump version	2024-05-10 00:56:19 -04:00
Sigbjørn Skjæret	561e880654	fix(security): Render all jinja templates in immutable sandbox (#1441 ) Chat templates are rendered with ImmutableSandboxedEnvironment in transformers so no need to do otherwise here. Co-authored-by: Andrei <abetlen@gmail.com>	2024-05-10 00:49:40 -04:00
Patrick Peng	b454f40a9a	Merge pull request from GHSA-56xg-wfcc-g829 Co-authored-by: Andrei <abetlen@gmail.com>	2024-05-10 00:47:56 -04:00
Sigbjørn Skjæret	5ab40e6167	feat: Support multiple chat templates - step 1 (#1396 ) * Support multiple chat templates - step 1 As a first step, allow user to to select template from metadata with chat_format parameter in the form of `chat_template.name`. * register chat templates to self.chat_formats instead of globally * Don't expose internal chat handlers yet --------- Co-authored-by: Andrei <abetlen@gmail.com>	2024-05-09 09:49:09 -04:00
Andrei Betlen	bf66a283e8	chore: Bump version	2024-05-09 03:02:52 -04:00
Andrei Betlen	3757328b70	fix: free last image embed in llava chat handler	2024-05-08 22:16:18 -04:00
Andrei Betlen	77122638b4	fix: Make leading bos_token optional for image chat formats, fix nanollava system message	2024-05-08 13:12:31 -04:00
Andrei Betlen	2a39b99575	feat: Update llama.cpp	2024-05-08 08:42:22 -04:00
Andrei Betlen	9ce5cb376a	chore: Bump version	2024-05-08 02:36:42 -04:00
Sigbjørn Skjæret	4a7122d22f	feat: fill-in-middle support (#1386 ) * Proper fill-in-middle support Use prefix/middle/suffix tokens when metadata is present in GGUF, like f.ex. in [this](https://huggingface.co/CISCai/CodeQwen1.5-7B-Chat-SOTA-GGUF) one. * fall back to internal prefix/middle/suffix id In some cases llama.cpp will make a guess at fim tokens, use them if there's no metadata. * typo-- * don't insert special tokens that are not there in suffix Note: add_bos is misnamed, it's actually add_special and can cause several special tokens to be added to the token list (the special parameter is actually parse_special). * don't add/parse any special tokens when using fim I've left original behavior when no fim tokens are found, but this should perhaps be re-evaluated. * don't append suffix to prompt_tokens unless fim tokens are detected * make sure we only do this for fim --------- Co-authored-by: Andrei <abetlen@gmail.com>	2024-05-08 02:26:22 -04:00
Andrei Betlen	228949c1f7	feat: Update llama.cpp	2024-05-08 02:22:15 -04:00
Sarunas Kalade	903b28adf5	fix: adding missing args in create_completion for functionary chat handler (#1430 )	2024-05-08 02:21:27 -04:00
Bruno Alvisio	a50d24e3a7	fix: chat_format log where auto-detected format prints `None` (#1434 )	2024-05-08 02:19:35 -04:00
Andrei Betlen	0318702cdc	feat(server): Add support for setting root_path. Closes #1420	2024-05-05 12:49:31 -04:00
Andrei Betlen	3e2597eac8	feat: Update llama.cpp	2024-05-05 12:12:27 -04:00
Noam Gat	e0d7674e62	fix: detokenization case where first token does not start with a leading space (#1375 ) * Fix tokenization edge case where llama output does not start with a space See this notebook: https://colab.research.google.com/drive/1Ooz11nFPk19zyJdMDx42CeesU8aWZMdI#scrollTo=oKpHw5PZ30uC * Update _internals.py Fixing to compare to b' ' instead of (str)' ' --------- Co-authored-by: Andrei <abetlen@gmail.com>	2024-05-04 10:14:59 -04:00
Jeffrey Fong	1f56c648c3	feat: Implement streaming for Functionary v2 + Bug fixes (#1419 ) * set up streaming for v2 * assert v2 streaming, fix tool_call vs function_call * fix streaming with tool_choice/function_call * make functions return 1 function call only when 'auto' * fix --------- Co-authored-by: Andrei <abetlen@gmail.com>	2024-05-04 10:11:20 -04:00
Andrei Betlen	f9b7221c8f	Merge branch 'main' of github.com:abetlen/llama_cpp_python into main	2024-05-03 19:07:54 -04:00
Andrei Betlen	9f7a85571a	fix: Use memmove to copy str_value kv_override. Closes #1417	2024-05-03 19:07:50 -04:00
Andrei Betlen	0a454bebe6	feat(server): Remove temperature bounds checks for server. Closes #1384	2024-05-03 15:23:06 -04:00
Daniel Thuerck	2138561fab	fix(server): Propagate `flash_attn` to model load. (#1424 )	2024-05-03 12:17:07 -04:00
Andrei Betlen	2117122396	chore: Bump version	2024-05-02 12:07:09 -04:00
Andrei Betlen	31b1d95a6c	feat: Add llama-3-vision-alpha chat format	2024-05-02 11:32:18 -04:00
Andrei Betlen	4f01c452b6	fix: Change default verbose value of verbose in image chat format handlers to True to match Llama	2024-04-30 15:50:30 -04:00
Andrei Betlen	9286b5caac	Merge branch 'main' of github.com:abetlen/llama_cpp_python into main	2024-04-30 15:45:36 -04:00
Andrei Betlen	f116175a5a	fix: Suppress all logs when verbose=False, use hardcoded fileno's to work in colab notebooks. Closes #796 Closes #729	2024-04-30 15:45:34 -04:00
Jonathan Soma	3226b3c5ef	fix: UTF-8 handling with grammars (#1415 ) Use Python's built-in UTF-8 handling to get code points	2024-04-30 14:33:23 -04:00
Andrei Betlen	b14dd98922	chore: Bump version	2024-04-30 09:39:56 -04:00
Andrei Betlen	29b6e9a5c8	fix: wrong parameter for flash attention in pickle __getstate__	2024-04-30 09:32:47 -04:00
Andrei Betlen	22d77eefd2	feat: Add option to enable `flash_attn` to Lllama params and ModelSettings	2024-04-30 09:29:16 -04:00
Andrei Betlen	8c2b24d5aa	feat: Update llama.cpp	2024-04-30 09:27:55 -04:00
Andrei Betlen	f417cce28a	chore: Bump version	2024-04-30 03:11:02 -04:00
Andrei Betlen	3489ef09d3	fix: Ensure image renders before text in chat formats regardless of message content order.	2024-04-30 03:08:46 -04:00
Andrei Betlen	26c7876ba0	chore: Bump version	2024-04-30 01:48:40 -04:00
Andrei	fe2da09538	feat: Generic Chat Formats, Tool Calling, and Huggingface Pull Support for Multimodal Models (Obsidian, LLaVA1.6, Moondream) (#1147 ) * Test dummy image tags in chat templates * Format and improve types for llava_cpp.py * Add from_pretrained support to llava chat format. * Refactor llava chat format to use a jinja2 * Revert chat format test * Add moondream support (wip) * Update moondream chat format * Update moondream chat format * Update moondream prompt * Add function calling support * Cache last image embed * Add Llava1.6 support * Add nanollava support * Add obisidian support * Remove unnecessary import * Re-order multimodal chat formats * Logits all no longer required for multi-modal models * Update README.md * Update docs * Update README * Fix typo * Update README * Fix typo	2024-04-30 01:35:38 -04:00
Andrei Betlen	97fb860eba	feat: Update llama.cpp	2024-04-29 23:34:55 -04:00
Andrei Betlen	a411612b38	feat: Add support for str type kv_overrides	2024-04-27 23:42:19 -04:00

1 2 3 4 5 ...

806 commits