baalajimaestro/llama.cpp

Author	SHA1	Message	Date
Andrei Betlen	20ea6fd7d6	chore: Bump version	2024-02-23 12:38:36 -05:00
Andrei Betlen	47bad30dd7	fix: LlamaHFTokenizer now receives pre_tokens	2024-02-23 12:23:24 -05:00
Andrei Betlen	ded5d627a5	chore: Bump version	2024-02-23 11:32:43 -05:00
Luke Stanley	858496224e	feat: Auto detect Mixtral's slightly different format (#1214 )	2024-02-23 11:27:38 -05:00
Andrei Betlen	db776a885c	fix: module 'llama_cpp.llama_cpp' has no attribute 'c_uint8'	2024-02-23 11:24:53 -05:00
Andrei Betlen	427d816ebf	chore: Bump version	2024-02-23 04:54:08 -05:00
Alvaro Bartolome	251a8a2cad	feat: Add Google's Gemma formatting via `chat_format="gemma"` (#1210 ) * Add Google's Gemma formatting via `chat_format="gemma"` * Replace `raise ValueError` with `logger.debug` Co-authored-by: Andrei <abetlen@gmail.com> --------- Co-authored-by: Andrei <abetlen@gmail.com>	2024-02-23 04:40:52 -05:00
Andrei Betlen	b9aca612af	misc: use typesafe byref for internal classes	2024-02-23 03:40:07 -05:00
Andrei Betlen	a0ce429dc0	misc: use decorator to bind low level api functions, fixes docs	2024-02-23 03:39:38 -05:00
Andrei Betlen	e10af30cf1	fix: TypeAlias import error	2024-02-22 03:27:28 -05:00
Andrei Betlen	3561ebf536	Merge branch 'main' of https://github.com/abetlen/llama-cpp-python into main	2024-02-22 03:25:13 -05:00
Andrei Betlen	aefcb8f71a	misc: additional type annotations for low level api	2024-02-22 02:00:09 -05:00
Andrei Betlen	3921e10770	feat: support minItems/maxItems in JSON grammar converter (by @nopperl)	2024-02-22 00:17:06 -05:00
Andrei Betlen	e6d6260a91	fix: Update from_pretrained defaults to match hf_hub_download	2024-02-22 00:10:23 -05:00
Andrei Betlen	dd22010e85	fix: Raise exceptions when llama model or context fails to load	2024-02-22 00:09:45 -05:00
Andrei Betlen	3632241e98	chore: Bump version	2024-02-21 23:09:13 -05:00
Andrei Betlen	0653e15c20	feat: Update llama.cpp	2024-02-21 23:04:52 -05:00
Andrei Betlen	7981e9ce1e	chore: Bump version	2024-02-21 16:30:59 -05:00
Andrei	7f51b6071f	feat(low-level-api): Improve API static type-safety and performance (#1205 )	2024-02-21 16:25:38 -05:00
Andrei	0f8aa4ab5c	feat: Pull models directly from huggingface (#1206 ) * Add from_pretrained method to Llama class * Update docs * Merge filename and pattern	2024-02-21 16:25:10 -05:00
Andrei Betlen	e42f62c247	chore: Bump version	2024-02-21 11:09:40 -05:00
Andrei Betlen	4edde21b3d	feat: Update llama.cpp	2024-02-21 11:05:58 -05:00
Andrei Betlen	6225f027e5	feat: Update llama.cpp	2024-02-19 04:11:34 -05:00
Andrei Betlen	748c0ce057	feat: Update llama.cpp	2024-02-18 21:30:36 -05:00
Andrei Betlen	53f6f5f415	fix: self.numa missing	2024-02-17 01:02:33 -05:00
Andrei Betlen	fdce078cb9	feat: Update llama.cpp	2024-02-17 00:37:51 -05:00
Andrei Betlen	f736827b9b	chore: Bump version	2024-02-15 23:10:50 -05:00
Andrei Betlen	0ce66bc080	fix: create_embedding broken response for input type str	2024-02-15 16:09:48 -05:00
khimaros	ea1f88dd29	fix: Use '\n' seperator for EventSourceResponse (#1188 ) this fixes compatibility with some OpenAI clients, including BetterChatGPT (https://github.com/ztjhz/BetterChatGPT/issues/537). Co-authored-by: Andrei <abetlen@gmail.com>	2024-02-15 15:20:13 -05:00
Andrei Betlen	a5cfeb7763	feat: Update llama.cpp	2024-02-15 15:17:30 -05:00
Douglas Hanley	7bb91f025f	fix: Incorporate embedding pooling layer fixes (#1194 ) * remove division by token count * truncate to n_batch, not n_ctx	2024-02-15 15:16:30 -05:00
Andrei Betlen	ae71ad1a14	Bump version	2024-02-14 04:31:42 -05:00
Douglas Hanley	d7a67917ba	feat: Support batch embeddings (#1186 ) * handle batched embeddings * fix normalization issue * fix type hints, ensure no breaking changes to embed * Clear kv cache / reset internal state after embedding complete --------- Co-authored-by: Andrei <abetlen@gmail.com>	2024-02-14 04:26:09 -05:00
Andrei Betlen	7b9960d1cb	Update llama.cpp	2024-02-14 03:47:21 -05:00
Andrei Betlen	6943bab6d8	fix: destructor exception where internal classes are missing some uninitialized attributes	2024-02-14 03:38:41 -05:00
Andrei Betlen	07a783779a	fix: Update openbuddy prompt format. Closes #1155	2024-02-13 23:57:10 -05:00
Andrei Betlen	345215a76c	fix: more chatml-function-calling fixes	2024-02-13 23:02:50 -05:00
Andrei Betlen	b1637c2319	Bump version	2024-02-13 12:35:04 -05:00
Andrew Lapp	d6be5333e1	fix: sample idx off-by-one error for logit_processors (#1179 ) * fix sample_idx off-by-one error * self._scores is indexed differently, only modify the index within self._input_ids --------- Co-authored-by: Andrew Lapp <andrew@rew.la> Co-authored-by: Andrei <abetlen@gmail.com>	2024-02-13 12:26:07 -05:00
Andrei Betlen	f7cdf78788	Update llama.cpp	2024-02-13 12:24:00 -05:00
Andrei Betlen	68fb71b6a2	fix: missing generation_prompt in chatml-function-calling	2024-02-13 03:24:41 -05:00
Andrei Betlen	4b0e3320bd	fix: minor formatting bugs for chatml-function-calling	2024-02-13 03:11:35 -05:00
Andrei Betlen	6fe8b427e1	Bump version	2024-02-13 02:46:52 -05:00
Andrei Betlen	d1822fed6b	fix: Don't change order of json schema object properties unless prop_order is passed, Closes #1180	2024-02-13 02:44:00 -05:00
Andrei Betlen	d605875772	Bump version	2024-02-12 16:28:30 -05:00
Andrei Betlen	cb791716b4	fix: Always set logits_all = True when using speculative decoding	2024-02-12 16:19:05 -05:00
Andrei	153a0049d9	feat: Generic chatml Function Calling (#957 ) * Add demo notebook * Add initial chat handler * Update OpenAI types * Add generic chatml function calling (wip) * Update chatml generic function calling. * Progress on auto-tool calls * fix streaming functions * Remove print statements * fix: Suppress output from llama.cpp init and grammar creation * Add OpenAI v1 python api compatible chat completion function * Support non-streaming multi-tool calls * Format * Include function_call in response.	2024-02-12 15:56:07 -05:00
Andrei Betlen	69413ce08e	Update llama.cpp	2024-02-11 19:00:17 -05:00
Connor	a05d90446f	fix: Circular dependancy preventing early Llama object free (#1176 ) commit `901827013b` introduced a cyclic dependency within Llama objects. That change causes old models to linger in memory longer than necessary, thereby creating memory bloat in most applications attempting to switch between models at runtime. This patch simply removes the problematic line, allowing models to deallocate without relying on GC. One might also consider combining `weakref.ref` with a `@property` if the `llama` attribute is absolutely necessary to expose in the tokenizer class.	2024-02-11 13:57:57 -05:00
Andrei Betlen	4abb8c9386	Merge branch 'main' of github.com:abetlen/llama_cpp_python into main	2024-02-09 13:32:31 -05:00
Andrei Betlen	e16f06e6eb	fix: revert _create_completions.	2024-02-09 02:02:13 -05:00
Andrei Betlen	85d3374b4d	fix: broken import	2024-02-08 01:13:28 -05:00
Andrei Betlen	b5fca911b5	feat: Move tokenizer to own module	2024-02-08 01:08:18 -05:00
Jeffrey Fong	901827013b	feat: Integrate functionary v1.4 and v2 models + add custom tokenizer support to Llama class (#1078 ) * convert functionary-v1 chat handler to use hf autotokenizer * add hf_tokenizer + inteegrate functionary-v1.4 prompt template * integrate functionary v2 prompt template * update readme * set up parallel function calling wip * set up parallel function calling * Update README.md * Update README.md * refactor tokenizers * include old functionary handler for backward compatibility * add hf_tokenizer_path in server ModelSettings * convert functionary-v1 chat handler to use hf autotokenizer * add hf_tokenizer + inteegrate functionary-v1.4 prompt template * integrate functionary v2 prompt template * update readme * set up parallel function calling wip * resolve merge conflict * Update README.md * Update README.md * refactor tokenizers * include old functionary handler for backward compatibility * add hf_tokenizer_path in server ModelSettings * Cleanup PR, fix breaking changes * Use hf_pretrained_model_name_or_path for tokenizer * fix hf tokenizer in streaming * update README * refactor offset mapping --------- Co-authored-by: Andrei <abetlen@gmail.com>	2024-02-07 20:07:03 -05:00
Andrei Betlen	34f31040f6	Bump version	2024-02-06 12:47:59 -05:00
Andrei Betlen	59760c85ed	fix: Use llama_log_callback to avoid suppress_stdout_stderr	2024-02-05 21:52:12 -05:00
Andrei Betlen	3553b14670	Update llama.cpp	2024-02-05 13:26:50 -05:00
Andrei	7467f129e5	Revert "Fix: fileno error google colab (#729 ) (#1156 )" (#1157 ) This reverts commit `bebfba0f08`.	2024-02-02 12:18:55 -05:00
Dulsara	bebfba0f08	Fix: fileno error google colab (#729 ) (#1156 ) Instead of using a devnull just made a dummy class with a 'write()' method that does nothing.	2024-02-02 12:05:46 -05:00
Andrei Betlen	3322eadbf3	Bump version	2024-01-31 15:10:18 -05:00
Andrei	fb762a6041	Add speculative decoding (#1120 ) * Add draft model param to llama class, implement basic prompt lookup decoding draft model * Use samplingcontext for sampling * Use 1d array * Use draft model for sampling * Fix dumb mistake * Allow for later extensions to the LlamaDraftModel api * Cleanup * Adaptive candidate prediction * Update implementation to match hf transformers * Tuning * Fix bug where last token was not used for ngram prediction * Remove heuristic for num_pred_tokens (no benefit) * fix: n_candidates bug. * Add draft_model_num_pred_tokens server setting * Cleanup * Update README	2024-01-31 14:08:14 -05:00
Andrei Betlen	71e3e4c435	Update llama.cpp	2024-01-31 10:41:42 -05:00
Andrei Betlen	078cca0361	fix: Pass raise_exception and add_generation_prompt to jinja2 chat template	2024-01-31 08:42:21 -05:00
Andrei Betlen	bf9e824922	Bump version	2024-01-30 12:27:27 -05:00
Andrei Betlen	011cd84ded	Update llama.cpp	2024-01-30 09:48:09 -05:00
Andrei	da003d8768	Automatically set chat format from gguf (#1110 ) * Use jinja formatter to load chat format from gguf * Fix off-by-one error in metadata loader * Implement chat format auto-detection	2024-01-29 14:22:23 -05:00
Andrei Betlen	464af5b39f	Bump version	2024-01-29 10:46:04 -05:00
Andrei Betlen	9ae5819ee4	Add chat format test.	2024-01-29 00:59:01 -05:00
Rafaelblsilva	ce38dbdf07	Add mistral instruct chat format as "mistral-instruct" (#799 ) * Added mistral instruct chat format as "mistral" * Fix stop sequence (merge issue) * Update chat format name to `mistral-instruct` --------- Co-authored-by: Andrei <abetlen@gmail.com>	2024-01-29 00:34:42 -05:00
Andrei Betlen	52c4a84faf	Bump version	2024-01-28 19:35:37 -05:00
Andrei Betlen	c1d0fff8a9	Bump version	2024-01-27 18:36:56 -05:00
Andrei	d8f6914f45	Add json schema mode (#1122 ) * Add json schema mode * Add llava chat format support	2024-01-27 16:52:18 -05:00
Andrei Betlen	35918873b4	Update llama.cpp	2024-01-26 11:45:48 -05:00
Andrei Betlen	f5cc6b3053	Bump version	2024-01-25 11:28:16 -05:00
Andrei Betlen	cde7514c3d	feat(server): include llama-cpp-python version in openapi spec	2024-01-25 11:23:18 -05:00
Andrei Betlen	c970d41a85	fix: llama_log_set should be able to accept null pointer	2024-01-24 10:38:30 -05:00
Andrei Betlen	9677a1f2c8	fix: Check order	2024-01-23 22:28:03 -05:00
Andrei Betlen	4d6b2f7b91	fix: format	2024-01-23 22:08:27 -05:00
Phil H	fe5d6ea648	fix: GGUF metadata KV overrides, re #1011 (#1116 ) * kv overrides another attempt * add sentinel element, simplify array population * ensure sentinel element is zeroed	2024-01-23 22:00:38 -05:00
Andrei Betlen	fcdf337d84	Update llama.cpp	2024-01-22 11:25:11 -05:00
Andrei Betlen	5b982d0f8c	fix: use both eos and bos tokens as stop sequences for hf-tokenizer-config chat format.	2024-01-22 08:32:48 -05:00
Andrei Betlen	2ce0b8aa2c	Bump version	2024-01-21 20:30:24 -05:00
Andrei Betlen	d3f5528ca8	fix: from_json_schema oneof/anyof bug. Closes #1097	2024-01-21 19:06:53 -05:00
Andrei Betlen	24f39454e9	fix: pass chat handler not chat formatter for huggingface autotokenizer and tokenizer_config formats.	2024-01-21 18:38:04 -05:00
Andrei Betlen	7f3209b1eb	feat: Add add_generation_prompt option for jinja2chatformatter.	2024-01-21 18:37:24 -05:00
Andrei Betlen	be09318c26	feat: Add Jinja2ChatFormatter	2024-01-19 15:04:42 -05:00
Andrei Betlen	5a34c57e54	feat: Expose gguf model metadata in metadata property	2024-01-19 10:46:03 -05:00
Andrei Betlen	833a7f1a86	Bump version	2024-01-19 09:03:35 -05:00
Andrei Betlen	3babe3512c	Fix mirostat sampling	2024-01-19 08:31:59 -05:00
Andrei Betlen	141293a75b	Fix python3.8 support	2024-01-19 08:17:49 -05:00
Andrei Betlen	656f3d8968	Bump version	2024-01-18 21:30:36 -05:00
Andrei Betlen	89cce50f8c	Update llama.cpp	2024-01-18 21:21:49 -05:00
Andrei Betlen	b8fc1c7d83	feat: Add ability to load chat format from huggingface autotokenizer or tokenizer_config.json files.	2024-01-18 21:21:37 -05:00
Andrei Betlen	48c3b77e6f	Offload KQV by default	2024-01-18 11:08:57 -05:00
Austin	6bfe98bd80	Integration of Jinja2 Templating (#875 ) * feat: Add support for jinja templating Signed-off-by: teleprint-me <77757836+teleprint-me@users.noreply.github.com> * fix: Refactor chat formatter and update interface for jinja templates - Simplify the `llama2_template` in `llama_jinja_format.py` by removing unnecessary line breaks for readability without affecting functionality. - Update `ChatFormatterInterface` constructor to accept a more generic `Optional[object]` type for the template parameter, enhancing flexibility. - Introduce a `template` property to `ChatFormatterInterface` for standardized access to the template string. - Replace `MetaSingleton` metaclass with `Singleton` for the `ChatFormatterFactory` to streamline the singleton implementation. These changes enhance code readability, maintain usability, and ensure consistency in the chat formatter's design pattern usage. * Add outline for Jinja2 templating integration documentation Signed-off-by: teleprint-me <77757836+teleprint-me@users.noreply.github.com> * Add jinja2 as a dependency with version range for Hugging Face transformers compatibility Signed-off-by: teleprint-me <77757836+teleprint-me@users.noreply.github.com> * Update jinja2 version constraint for mkdocs-material compatibility Signed-off-by: teleprint-me <77757836+teleprint-me@users.noreply.github.com> * Fix attribute name in AutoChatFormatter - Changed attribute name from `self._renderer` to `self._environment` --------- Signed-off-by: teleprint-me <77757836+teleprint-me@users.noreply.github.com>	2024-01-17 09:47:52 -05:00
Andrei Betlen	7b46bb5a78	Re-order classes in llama.py	2024-01-17 09:16:13 -05:00
Andrei Betlen	cc4630e66f	Move helper classes to _internals submodule	2024-01-17 09:14:00 -05:00
Andrei Betlen	3b92419132	Move cache classes to llama_cache submodule.	2024-01-17 09:09:12 -05:00
Kyle Mistele	9c36688b33	fix(cli): allow passing n_ctx=0 to openAI API server args to use model n_ctx_train field per #1015 (#1093 )	2024-01-16 18:54:06 -05:00
anil	cfb7da98ed	Support Accept text/event-stream in chat and completion endpoints, resolves #1083 (#1088 ) Co-authored-by: Anil Pathak <anil@heyday.com> Co-authored-by: Andrei Betlen <abetlen@gmail.com>	2024-01-16 12:52:52 -05:00

1 2 3 4 5 ...

719 commits