baalajimaestro/llama.cpp

Author	SHA1	Message	Date
Sigbjørn Skjæret	027f7bc678	fix: Avoid duplicate special tokens in chat formats (#1439 ) * Templates sometimes have BOS in them, remove duplicate * tokenize chat format prompts before completion This is to ensure that we don't duplicate any special tokens. Hopefully I amended the existing formats correctly? * updated comment * corrected a few * add some missing internals * proper bos/eos detection * just let tokenizer do the job * typo-- * align test with new response * changed to a warning * move to another PR * Use python warnings module --------- Co-authored-by: Andrei Betlen <abetlen@gmail.com>	2024-06-04 10:15:41 -04:00
Andrei Betlen	6b018e00b1	misc: Improve llava error messages	2024-06-03 11:19:10 -04:00
Andrei Betlen	165b4dc6c1	fix: Fix typo in Llama3VisionAlphaChatHandler. Closes #1488	2024-05-29 02:29:44 -04:00
Andrei Betlen	ac55d0a175	fix: Clear kv cache to avoid kv bug when image is evaluated first	2024-05-10 02:38:10 -04:00
Sigbjørn Skjæret	561e880654	fix(security): Render all jinja templates in immutable sandbox (#1441 ) Chat templates are rendered with ImmutableSandboxedEnvironment in transformers so no need to do otherwise here. Co-authored-by: Andrei <abetlen@gmail.com>	2024-05-10 00:49:40 -04:00
Patrick Peng	b454f40a9a	Merge pull request from GHSA-56xg-wfcc-g829 Co-authored-by: Andrei <abetlen@gmail.com>	2024-05-10 00:47:56 -04:00
Andrei Betlen	3757328b70	fix: free last image embed in llava chat handler	2024-05-08 22:16:18 -04:00
Andrei Betlen	77122638b4	fix: Make leading bos_token optional for image chat formats, fix nanollava system message	2024-05-08 13:12:31 -04:00
Sarunas Kalade	903b28adf5	fix: adding missing args in create_completion for functionary chat handler (#1430 )	2024-05-08 02:21:27 -04:00
Jeffrey Fong	1f56c648c3	feat: Implement streaming for Functionary v2 + Bug fixes (#1419 ) * set up streaming for v2 * assert v2 streaming, fix tool_call vs function_call * fix streaming with tool_choice/function_call * make functions return 1 function call only when 'auto' * fix --------- Co-authored-by: Andrei <abetlen@gmail.com>	2024-05-04 10:11:20 -04:00
Andrei Betlen	31b1d95a6c	feat: Add llama-3-vision-alpha chat format	2024-05-02 11:32:18 -04:00
Andrei Betlen	4f01c452b6	fix: Change default verbose value of verbose in image chat format handlers to True to match Llama	2024-04-30 15:50:30 -04:00
Andrei Betlen	3489ef09d3	fix: Ensure image renders before text in chat formats regardless of message content order.	2024-04-30 03:08:46 -04:00
Andrei	fe2da09538	feat: Generic Chat Formats, Tool Calling, and Huggingface Pull Support for Multimodal Models (Obsidian, LLaVA1.6, Moondream) (#1147 ) * Test dummy image tags in chat templates * Format and improve types for llava_cpp.py * Add from_pretrained support to llava chat format. * Refactor llava chat format to use a jinja2 * Revert chat format test * Add moondream support (wip) * Update moondream chat format * Update moondream chat format * Update moondream prompt * Add function calling support * Cache last image embed * Add Llava1.6 support * Add nanollava support * Add obisidian support * Remove unnecessary import * Re-order multimodal chat formats * Logits all no longer required for multi-modal models * Update README.md * Update docs * Update README * Fix typo * Update README * Fix typo	2024-04-30 01:35:38 -04:00
Jeffrey Fong	f178636e1b	fix: Functionary bug fixes (#1385 ) * fix completion tokens tracking, prompt forming * fix 'function_call' and 'tool_calls' depending on 'functions' and 'tools', incompatibility with python 3.8 * Updated README * fix for openai server compatibility --------- Co-authored-by: Andrei <abetlen@gmail.com>	2024-04-27 20:49:52 -04:00
abk16	8559e8ce88	feat: Add Llama-3 chat format (#1371 ) * feat: Add Llama-3 chat format * feat: Auto-detect Llama-3 chat format from gguf template * feat: Update llama.cpp to b2715 Includes proper Llama-3 <\|eot_id\|> token handling. --------- Co-authored-by: Andrei Betlen <abetlen@gmail.com>	2024-04-23 02:33:29 -04:00
Andrei Betlen	cc81afebf0	feat: Add stopping_criteria to ChatFormatter, allow stopping on arbitrary token ids, fixes llama3 instruct	2024-04-20 00:00:53 -04:00
Lucca Zenóbio	4f42664955	feat: update grammar schema converter to match llama.cpp (#1353 ) * feat: improve function calling * feat:grammar * fix * fix * fix	2024-04-18 01:36:25 -04:00
Andrei Betlen	fa4bb0cf81	Revert "feat: Update json to grammar (#1350 )" This reverts commit `610a592f70`.	2024-04-17 16:18:16 -04:00
Lucca Zenóbio	610a592f70	feat: Update json to grammar (#1350 ) * feat: improve function calling * feat:grammar	2024-04-17 10:10:21 -04:00
Andrei Betlen	bb65b4d764	fix: pass correct type to chat handlers for chat completion logprobs	2024-04-10 03:41:55 -04:00
Andrei Betlen	1ae3abbcc3	fix: missing logprobs in response, incorrect response type for functionary, minor type issues. Closes #1328 Closes #1314	2024-04-05 10:51:44 -04:00
windspirit95	aa9f1ae011	feat: Add logprobs support to chat completions (#1311 ) * Add logprobs return in ChatCompletionResponse * Fix duplicate field * Set default to false * Simplify check * Add server example --------- Co-authored-by: Andrei Betlen <abetlen@gmail.com>	2024-03-31 13:30:13 -04:00
Andrei Betlen	c1325dcdfb	fix: tool_call missing first token.	2024-03-22 23:44:04 -04:00
Andrei	60d8498f21	feat: Add tools/functions variables to Jinja2ChatFormatter, add function response formatting for all simple chat formats (#1273 ) * Add tools/functions variables to Jinja2ChatFormatter Also fixed missing tools/tool_choices parameters in chat_formatter_to_chat_completion_handler(). * Set grammar when doing explicit function calling * Add function / tool response for all chat formats --------- Co-authored-by: Sigbjørn Skjæret <sigbjorn.skjaeret@scala.com>	2024-03-19 04:55:57 -04:00
Jeffrey Fong	8a60c7bc8c	fix: Fix and optimize functionary chat handler (#1282 ) * fix functionary chat logic * further fixes --------- Co-authored-by: Andrei <abetlen@gmail.com>	2024-03-18 10:40:57 -04:00
Andrei Betlen	20e6815252	fix: json mode	2024-03-15 12:58:34 -04:00
Kevin Cao	1f3156d4f2	fix: Check for existence of clip model path (#1264 )	2024-03-08 21:00:10 -05:00
Andrei Betlen	dbaba3059d	fix: positional arguments only for low-level api	2024-02-26 11:31:11 -05:00
Andrei Betlen	78e536dcfe	fix: typo	2024-02-26 11:14:26 -05:00
Andrei Betlen	8383a9e562	fix: llava this function takes at least 4 arguments (0 given)	2024-02-26 11:03:20 -05:00
Luke Stanley	858496224e	feat: Auto detect Mixtral's slightly different format (#1214 )	2024-02-23 11:27:38 -05:00
Alvaro Bartolome	251a8a2cad	feat: Add Google's Gemma formatting via `chat_format="gemma"` (#1210 ) * Add Google's Gemma formatting via `chat_format="gemma"` * Replace `raise ValueError` with `logger.debug` Co-authored-by: Andrei <abetlen@gmail.com> --------- Co-authored-by: Andrei <abetlen@gmail.com>	2024-02-23 04:40:52 -05:00
Andrei Betlen	07a783779a	fix: Update openbuddy prompt format. Closes #1155	2024-02-13 23:57:10 -05:00
Andrei Betlen	345215a76c	fix: more chatml-function-calling fixes	2024-02-13 23:02:50 -05:00
Andrei Betlen	68fb71b6a2	fix: missing generation_prompt in chatml-function-calling	2024-02-13 03:24:41 -05:00
Andrei Betlen	4b0e3320bd	fix: minor formatting bugs for chatml-function-calling	2024-02-13 03:11:35 -05:00
Andrei	153a0049d9	feat: Generic chatml Function Calling (#957 ) * Add demo notebook * Add initial chat handler * Update OpenAI types * Add generic chatml function calling (wip) * Update chatml generic function calling. * Progress on auto-tool calls * fix streaming functions * Remove print statements * fix: Suppress output from llama.cpp init and grammar creation * Add OpenAI v1 python api compatible chat completion function * Support non-streaming multi-tool calls * Format * Include function_call in response.	2024-02-12 15:56:07 -05:00
Jeffrey Fong	901827013b	feat: Integrate functionary v1.4 and v2 models + add custom tokenizer support to Llama class (#1078 ) * convert functionary-v1 chat handler to use hf autotokenizer * add hf_tokenizer + inteegrate functionary-v1.4 prompt template * integrate functionary v2 prompt template * update readme * set up parallel function calling wip * set up parallel function calling * Update README.md * Update README.md * refactor tokenizers * include old functionary handler for backward compatibility * add hf_tokenizer_path in server ModelSettings * convert functionary-v1 chat handler to use hf autotokenizer * add hf_tokenizer + inteegrate functionary-v1.4 prompt template * integrate functionary v2 prompt template * update readme * set up parallel function calling wip * resolve merge conflict * Update README.md * Update README.md * refactor tokenizers * include old functionary handler for backward compatibility * add hf_tokenizer_path in server ModelSettings * Cleanup PR, fix breaking changes * Use hf_pretrained_model_name_or_path for tokenizer * fix hf tokenizer in streaming * update README * refactor offset mapping --------- Co-authored-by: Andrei <abetlen@gmail.com>	2024-02-07 20:07:03 -05:00
Andrei Betlen	078cca0361	fix: Pass raise_exception and add_generation_prompt to jinja2 chat template	2024-01-31 08:42:21 -05:00
Andrei	da003d8768	Automatically set chat format from gguf (#1110 ) * Use jinja formatter to load chat format from gguf * Fix off-by-one error in metadata loader * Implement chat format auto-detection	2024-01-29 14:22:23 -05:00
Andrei Betlen	9ae5819ee4	Add chat format test.	2024-01-29 00:59:01 -05:00
Rafaelblsilva	ce38dbdf07	Add mistral instruct chat format as "mistral-instruct" (#799 ) * Added mistral instruct chat format as "mistral" * Fix stop sequence (merge issue) * Update chat format name to `mistral-instruct` --------- Co-authored-by: Andrei <abetlen@gmail.com>	2024-01-29 00:34:42 -05:00
Andrei	d8f6914f45	Add json schema mode (#1122 ) * Add json schema mode * Add llava chat format support	2024-01-27 16:52:18 -05:00
Andrei Betlen	5b982d0f8c	fix: use both eos and bos tokens as stop sequences for hf-tokenizer-config chat format.	2024-01-22 08:32:48 -05:00
Andrei Betlen	7f3209b1eb	feat: Add add_generation_prompt option for jinja2chatformatter.	2024-01-21 18:37:24 -05:00
Andrei Betlen	be09318c26	feat: Add Jinja2ChatFormatter	2024-01-19 15:04:42 -05:00
Andrei Betlen	b8fc1c7d83	feat: Add ability to load chat format from huggingface autotokenizer or tokenizer_config.json files.	2024-01-18 21:21:37 -05:00
Fedor Moiseev	907b9e9d42	Add Saiga chat format. (#1050 )	2024-01-04 18:12:58 -05:00
xaviviro	cf743ec5d3	Added ChatGLM chat format (#1059 ) Co-authored-by: Xavier Vinaixa Rosello <xaviviro@MacBook-Pro-de-Xavier.local>	2024-01-04 18:12:02 -05:00

1 2

76 commits