baalajimaestro/llama.cpp

Author	SHA1	Message	Date
Andrei Betlen	cb791716b4	fix: Always set logits_all = True when using speculative decoding	2024-02-12 16:19:05 -05:00
Andrei	153a0049d9	feat: Generic chatml Function Calling (#957 ) * Add demo notebook * Add initial chat handler * Update OpenAI types * Add generic chatml function calling (wip) * Update chatml generic function calling. * Progress on auto-tool calls * fix streaming functions * Remove print statements * fix: Suppress output from llama.cpp init and grammar creation * Add OpenAI v1 python api compatible chat completion function * Support non-streaming multi-tool calls * Format * Include function_call in response.	2024-02-12 15:56:07 -05:00
Andrei Betlen	69413ce08e	Update llama.cpp	2024-02-11 19:00:17 -05:00
Connor	a05d90446f	fix: Circular dependancy preventing early Llama object free (#1176 ) commit `901827013b` introduced a cyclic dependency within Llama objects. That change causes old models to linger in memory longer than necessary, thereby creating memory bloat in most applications attempting to switch between models at runtime. This patch simply removes the problematic line, allowing models to deallocate without relying on GC. One might also consider combining `weakref.ref` with a `@property` if the `llama` attribute is absolutely necessary to expose in the tokenizer class.	2024-02-11 13:57:57 -05:00
Andrei Betlen	4abb8c9386	Merge branch 'main' of github.com:abetlen/llama_cpp_python into main	2024-02-09 13:32:31 -05:00
Andrei Betlen	e16f06e6eb	fix: revert _create_completions.	2024-02-09 02:02:13 -05:00
Andrei Betlen	85d3374b4d	fix: broken import	2024-02-08 01:13:28 -05:00
Andrei Betlen	b5fca911b5	feat: Move tokenizer to own module	2024-02-08 01:08:18 -05:00
Jeffrey Fong	901827013b	feat: Integrate functionary v1.4 and v2 models + add custom tokenizer support to Llama class (#1078 ) * convert functionary-v1 chat handler to use hf autotokenizer * add hf_tokenizer + inteegrate functionary-v1.4 prompt template * integrate functionary v2 prompt template * update readme * set up parallel function calling wip * set up parallel function calling * Update README.md * Update README.md * refactor tokenizers * include old functionary handler for backward compatibility * add hf_tokenizer_path in server ModelSettings * convert functionary-v1 chat handler to use hf autotokenizer * add hf_tokenizer + inteegrate functionary-v1.4 prompt template * integrate functionary v2 prompt template * update readme * set up parallel function calling wip * resolve merge conflict * Update README.md * Update README.md * refactor tokenizers * include old functionary handler for backward compatibility * add hf_tokenizer_path in server ModelSettings * Cleanup PR, fix breaking changes * Use hf_pretrained_model_name_or_path for tokenizer * fix hf tokenizer in streaming * update README * refactor offset mapping --------- Co-authored-by: Andrei <abetlen@gmail.com>	2024-02-07 20:07:03 -05:00
Andrei Betlen	34f31040f6	Bump version	2024-02-06 12:47:59 -05:00
Andrei Betlen	59760c85ed	fix: Use llama_log_callback to avoid suppress_stdout_stderr	2024-02-05 21:52:12 -05:00
Andrei Betlen	3553b14670	Update llama.cpp	2024-02-05 13:26:50 -05:00
Andrei	7467f129e5	Revert "Fix: fileno error google colab (#729 ) (#1156 )" (#1157 ) This reverts commit `bebfba0f08`.	2024-02-02 12:18:55 -05:00
Dulsara	bebfba0f08	Fix: fileno error google colab (#729 ) (#1156 ) Instead of using a devnull just made a dummy class with a 'write()' method that does nothing.	2024-02-02 12:05:46 -05:00
Andrei Betlen	3322eadbf3	Bump version	2024-01-31 15:10:18 -05:00
Andrei	fb762a6041	Add speculative decoding (#1120 ) * Add draft model param to llama class, implement basic prompt lookup decoding draft model * Use samplingcontext for sampling * Use 1d array * Use draft model for sampling * Fix dumb mistake * Allow for later extensions to the LlamaDraftModel api * Cleanup * Adaptive candidate prediction * Update implementation to match hf transformers * Tuning * Fix bug where last token was not used for ngram prediction * Remove heuristic for num_pred_tokens (no benefit) * fix: n_candidates bug. * Add draft_model_num_pred_tokens server setting * Cleanup * Update README	2024-01-31 14:08:14 -05:00
Andrei Betlen	71e3e4c435	Update llama.cpp	2024-01-31 10:41:42 -05:00
Andrei Betlen	078cca0361	fix: Pass raise_exception and add_generation_prompt to jinja2 chat template	2024-01-31 08:42:21 -05:00
Andrei Betlen	bf9e824922	Bump version	2024-01-30 12:27:27 -05:00
Andrei Betlen	011cd84ded	Update llama.cpp	2024-01-30 09:48:09 -05:00
Andrei	da003d8768	Automatically set chat format from gguf (#1110 ) * Use jinja formatter to load chat format from gguf * Fix off-by-one error in metadata loader * Implement chat format auto-detection	2024-01-29 14:22:23 -05:00
Andrei Betlen	464af5b39f	Bump version	2024-01-29 10:46:04 -05:00
Andrei Betlen	9ae5819ee4	Add chat format test.	2024-01-29 00:59:01 -05:00
Rafaelblsilva	ce38dbdf07	Add mistral instruct chat format as "mistral-instruct" (#799 ) * Added mistral instruct chat format as "mistral" * Fix stop sequence (merge issue) * Update chat format name to `mistral-instruct` --------- Co-authored-by: Andrei <abetlen@gmail.com>	2024-01-29 00:34:42 -05:00
Andrei Betlen	52c4a84faf	Bump version	2024-01-28 19:35:37 -05:00
Andrei Betlen	c1d0fff8a9	Bump version	2024-01-27 18:36:56 -05:00
Andrei	d8f6914f45	Add json schema mode (#1122 ) * Add json schema mode * Add llava chat format support	2024-01-27 16:52:18 -05:00
Andrei Betlen	35918873b4	Update llama.cpp	2024-01-26 11:45:48 -05:00
Andrei Betlen	f5cc6b3053	Bump version	2024-01-25 11:28:16 -05:00
Andrei Betlen	cde7514c3d	feat(server): include llama-cpp-python version in openapi spec	2024-01-25 11:23:18 -05:00
Andrei Betlen	c970d41a85	fix: llama_log_set should be able to accept null pointer	2024-01-24 10:38:30 -05:00
Andrei Betlen	9677a1f2c8	fix: Check order	2024-01-23 22:28:03 -05:00
Andrei Betlen	4d6b2f7b91	fix: format	2024-01-23 22:08:27 -05:00
Phil H	fe5d6ea648	fix: GGUF metadata KV overrides, re #1011 (#1116 ) * kv overrides another attempt * add sentinel element, simplify array population * ensure sentinel element is zeroed	2024-01-23 22:00:38 -05:00
Andrei Betlen	fcdf337d84	Update llama.cpp	2024-01-22 11:25:11 -05:00
Andrei Betlen	5b982d0f8c	fix: use both eos and bos tokens as stop sequences for hf-tokenizer-config chat format.	2024-01-22 08:32:48 -05:00
Andrei Betlen	2ce0b8aa2c	Bump version	2024-01-21 20:30:24 -05:00
Andrei Betlen	d3f5528ca8	fix: from_json_schema oneof/anyof bug. Closes #1097	2024-01-21 19:06:53 -05:00
Andrei Betlen	24f39454e9	fix: pass chat handler not chat formatter for huggingface autotokenizer and tokenizer_config formats.	2024-01-21 18:38:04 -05:00
Andrei Betlen	7f3209b1eb	feat: Add add_generation_prompt option for jinja2chatformatter.	2024-01-21 18:37:24 -05:00
Andrei Betlen	be09318c26	feat: Add Jinja2ChatFormatter	2024-01-19 15:04:42 -05:00
Andrei Betlen	5a34c57e54	feat: Expose gguf model metadata in metadata property	2024-01-19 10:46:03 -05:00
Andrei Betlen	833a7f1a86	Bump version	2024-01-19 09:03:35 -05:00
Andrei Betlen	3babe3512c	Fix mirostat sampling	2024-01-19 08:31:59 -05:00
Andrei Betlen	141293a75b	Fix python3.8 support	2024-01-19 08:17:49 -05:00
Andrei Betlen	656f3d8968	Bump version	2024-01-18 21:30:36 -05:00
Andrei Betlen	89cce50f8c	Update llama.cpp	2024-01-18 21:21:49 -05:00
Andrei Betlen	b8fc1c7d83	feat: Add ability to load chat format from huggingface autotokenizer or tokenizer_config.json files.	2024-01-18 21:21:37 -05:00
Andrei Betlen	48c3b77e6f	Offload KQV by default	2024-01-18 11:08:57 -05:00
Austin	6bfe98bd80	Integration of Jinja2 Templating (#875 ) * feat: Add support for jinja templating Signed-off-by: teleprint-me <77757836+teleprint-me@users.noreply.github.com> * fix: Refactor chat formatter and update interface for jinja templates - Simplify the `llama2_template` in `llama_jinja_format.py` by removing unnecessary line breaks for readability without affecting functionality. - Update `ChatFormatterInterface` constructor to accept a more generic `Optional[object]` type for the template parameter, enhancing flexibility. - Introduce a `template` property to `ChatFormatterInterface` for standardized access to the template string. - Replace `MetaSingleton` metaclass with `Singleton` for the `ChatFormatterFactory` to streamline the singleton implementation. These changes enhance code readability, maintain usability, and ensure consistency in the chat formatter's design pattern usage. * Add outline for Jinja2 templating integration documentation Signed-off-by: teleprint-me <77757836+teleprint-me@users.noreply.github.com> * Add jinja2 as a dependency with version range for Hugging Face transformers compatibility Signed-off-by: teleprint-me <77757836+teleprint-me@users.noreply.github.com> * Update jinja2 version constraint for mkdocs-material compatibility Signed-off-by: teleprint-me <77757836+teleprint-me@users.noreply.github.com> * Fix attribute name in AutoChatFormatter - Changed attribute name from `self._renderer` to `self._environment` --------- Signed-off-by: teleprint-me <77757836+teleprint-me@users.noreply.github.com>	2024-01-17 09:47:52 -05:00
Andrei Betlen	7b46bb5a78	Re-order classes in llama.py	2024-01-17 09:16:13 -05:00
Andrei Betlen	cc4630e66f	Move helper classes to _internals submodule	2024-01-17 09:14:00 -05:00
Andrei Betlen	3b92419132	Move cache classes to llama_cache submodule.	2024-01-17 09:09:12 -05:00
Kyle Mistele	9c36688b33	fix(cli): allow passing n_ctx=0 to openAI API server args to use model n_ctx_train field per #1015 (#1093 )	2024-01-16 18:54:06 -05:00
anil	cfb7da98ed	Support Accept text/event-stream in chat and completion endpoints, resolves #1083 (#1088 ) Co-authored-by: Anil Pathak <anil@heyday.com> Co-authored-by: Andrei Betlen <abetlen@gmail.com>	2024-01-16 12:52:52 -05:00
Andrei Betlen	4b11fa83c0	Bump version	2024-01-15 12:54:51 -05:00
Andrei Betlen	84615adbc6	Add split_mode option. Closes #1085	2024-01-15 12:49:20 -05:00
Phil H	76aafa6149	Implement GGUF metadata KV overrides (#1011 ) * Implement GGUF metadata overrides * whitespace fix * Fix kv overrides. * Fix pointer and pickle * Match llama.cpp kv_overrides cli argument --------- Co-authored-by: Andrei <abetlen@gmail.com>	2024-01-15 12:29:29 -05:00
yieldthought	7eff42c239	Avoid "LookupError: unknown encoding: ascii" when open() called in a destructor (#1012 ) The existing code often causes "LookupError: unknown encoding: ascii" when open() called in a destructor. Saving open in self.open is not enough to avoid this. Instead, we can avoid reopening /dev/null every time by doing it once when the module is loaded.	2024-01-15 10:52:10 -05:00
Mark Neumann	c689ccc728	Fix Pydantic model parsing (#1087 )	2024-01-15 10:45:57 -05:00
Andrei Betlen	5502ac8876	Update llama.cpp	2024-01-15 10:12:10 -05:00
Andrei Betlen	359ae73643	Update llama.cpp	2024-01-14 08:17:22 -05:00
Andrei Betlen	7c898d5684	Update llama.cpp	2024-01-13 22:37:49 -05:00
Andrei Betlen	bb610b9428	Update llama.cpp	2024-01-11 22:51:12 -05:00
Andrei Betlen	f0159663d9	Bump version	2024-01-10 02:51:17 -05:00
Stephen Hankinson	df3be58d6c	Add ability to pass in penalize_nl param (#1068 )	2024-01-10 02:46:27 -05:00
Joseph Turian	2ddce7294e	print_grammar to stderr (#1052 )	2024-01-10 02:46:03 -05:00
Andrei Betlen	1ae05c102b	Update llama.cpp	2024-01-08 14:51:29 -05:00
Andrei Betlen	75d0527fd7	Bump version	2024-01-04 18:30:12 -05:00
Fedor Moiseev	907b9e9d42	Add Saiga chat format. (#1050 )	2024-01-04 18:12:58 -05:00
xaviviro	cf743ec5d3	Added ChatGLM chat format (#1059 ) Co-authored-by: Xavier Vinaixa Rosello <xaviviro@MacBook-Pro-de-Xavier.local>	2024-01-04 18:12:02 -05:00
Andrei Betlen	eb9c7d4ed8	Update llama.cpp	2024-01-03 22:04:04 -05:00
Andrei Betlen	011c3630f5	Bump version	2023-12-27 17:35:02 -05:00
Andrei Betlen	92284f32cb	Add HIP_PATH to dll search directories for windows users.	2023-12-22 15:29:56 -05:00
Andrei Betlen	2b0d3f36fa	set llama_max_devices using library function	2023-12-22 15:19:28 -05:00
Andrei Betlen	d9a1d90fd7	Fix typo	2023-12-22 15:12:27 -05:00
Andrei Betlen	37556bf9c4	Bump version	2023-12-22 14:55:58 -05:00
Andrei Betlen	6d8bc090f9	fix: inccorect bindings for kv override. Based on #1011	2023-12-22 14:52:20 -05:00
Andrei Betlen	522aecb868	docs: add server config docs	2023-12-22 14:37:24 -05:00
Andrei Betlen	6473796343	Update llama.cpp	2023-12-22 14:10:34 -05:00
swg	4b01a873ef	server: Support none defaulting to infinity for completions (#111 ) * Support defaulting to infinity or -1 for chat completions * Check if completion_tokens is none in error handler. * fix: max_tokens in create completion should match openai spec * Fix __call__ --------- Co-authored-by: Andrei Betlen <abetlen@gmail.com>	2023-12-22 14:05:13 -05:00
Dave	12b7f2f4e9	[Feat] Multi model support (#931 ) * Update Llama class to handle chat_format & caching * Add settings.py * Add util.py & update __main__.py * multimodel * update settings.py * cleanup * delete util.py * Fix /v1/models endpoint * MultiLlama now iterable, app check-alive on "/" * instant model init if file is given * backward compability * revert model param mandatory * fix error * handle individual model config json * refactor * revert chathandler/clip_model changes * handle chat_handler in MulitLlama() * split settings into server/llama * reduce global vars * Update LlamaProxy to handle config files * Add free method to LlamaProxy * update arg parsers & install server alias * refactor cache settings * change server executable name * better var name * whitespace * Revert "whitespace" This reverts commit bc5cf51c64a95bfc9926e1bc58166059711a1cd8. * remove exe_name * Fix merge bugs * Fix type annotations * Fix type annotations * Fix uvicorn app factory * Fix settings * Refactor server * Remove formatting fix * Format * Use default model if not found in model settings * Fix * Cleanup * Fix * Fix * Remove unnused CommandLineSettings * Cleanup * Support default name for copilot-codex models --------- Co-authored-by: Andrei Betlen <abetlen@gmail.com>	2023-12-22 05:51:25 -05:00
Andrei Betlen	4a85442c35	Update llama.cpp	2023-12-22 00:12:37 -05:00
twaka	2f03fb0231	fix text_offset of multi-token characters (#1037 ) * fix text_offsets for bytes tokens * fix	2023-12-22 00:03:29 -05:00
docmeth02	33cc623346	Implement openai api compatible authentication (#1010 )	2023-12-21 13:44:49 -05:00
Andrei Betlen	a05b4da80a	fix: float32 is not JSON serializable when streaming logits.	2023-12-18 18:40:36 -05:00
Andrei Betlen	7df6c32544	Fix type annotations	2023-12-18 18:14:53 -05:00
Andrei Betlen	b703aad79e	Fix type annotation	2023-12-18 18:13:37 -05:00
Andrei Betlen	d0aedfcff6	Fix type annotation	2023-12-18 18:12:49 -05:00
Eduard Christian Dumitrescu	2993936b10	Fix ctypes definitions of `llama_kv_cache_view_update` and `llama_kv_cache_view_free`. (#1028 )	2023-12-18 18:11:26 -05:00
Andrei Betlen	5e863d8a3b	Bump version	2023-12-18 16:09:18 -05:00
Andrei Betlen	095c650006	Add offload_kqv option to llama and server	2023-12-18 15:36:09 -05:00
Andrei Betlen	472b344ae3	Remove unnused import	2023-12-18 15:32:40 -05:00
kddubey	6b2e0e05b4	perf: Don't convert logprobs arrays to lists (#1021 )	2023-12-18 14:28:12 -05:00
Brandon Roberts	62944df142	Bugfix: Remove f16_kv, add offload_kqv field (#1019 ) F16_KV appears to have been removed here: `af99c6fbfc` This addresses two issues: - #995 which just requests to add the KV cache offloading param - #1006 a NULL ptr exception when using the embeddings (introduced by leaving f16_kv in the fields struct)	2023-12-18 14:27:11 -05:00
Daniele Morotti	f1c631dc53	Bug fixed with n_ctx=0 (#1015 ) If the n_ctx is set to 0 the code should use the maximum context length of the selected model, but it didn't work. There was a problem with the initialization of this parameter and a related problem with 'n_batch'.	2023-12-16 18:59:50 -05:00
kddubey	5a8944672f	Fix logits_to_logprobs for 2-D and 3-D logits (#1002 ) * Fix logits_to_logprobs for 2-D and 3-D logits * Set dtype to single * Test size	2023-12-16 18:59:26 -05:00
Andrei Betlen	534b1ea9b5	Update llama.cpp	2023-12-16 18:57:43 -05:00
Andrei Betlen	cbce061ffd	Bump version	2023-12-13 21:52:29 -05:00
yhfgyyf	8b4db732bd	Add qwen chat format (#1005 )	2023-12-13 21:43:43 -05:00
Andrei Betlen	690c563b60	Merge branch 'main' of github.com:abetlen/llama_cpp_python into main	2023-12-13 21:43:19 -05:00
Andrei Betlen	c0fc0a1e82	Update llama.cpp	2023-12-13 21:43:16 -05:00
Radoslav Gerganov	8e44a32075	Add support for running the server with SSL (#994 )	2023-12-11 20:47:11 -05:00
Tanner Hobson	ef22e478db	Replace logits_to_logprobs implementation with numpy equivalent to llama.cpp (#991 ) See #990. This change makes the logits_to_logprobs function equivalent to the version in the llama.cpp repository. It uses numpy so it's much faster than the previous version.	2023-12-11 20:46:27 -05:00
zocainViken	ac35f68e4d	Fix UnsupportedOperation: fileno in suppress_stdout_stderr (#961 ) * bug fixing * llava from readme got this error: UnsupportedOperation: fileno quick fix by checking hasattr * multi modal params fix: add logits = True -> to make llava work * multi modal params fix: add logits = True -> to make llava work --------- Co-authored-by: Andrei <abetlen@gmail.com>	2023-12-11 20:44:51 -05:00
chiensen	b938cccf05	Add Pygmalion chat format (#986 )	2023-12-11 20:44:04 -05:00
Andrei Betlen	c1e73e73a3	Bump version	2023-12-11 10:26:42 -05:00
Andrei Betlen	ec26f364cc	Remove f16_kv	2023-12-11 10:25:37 -05:00
Andrei Betlen	f1edc66b21	Update llama.cpp	2023-12-11 10:21:35 -05:00
kddubey	b069d06346	Fix #891 (#952 )	2023-11-29 05:39:52 -05:00
Andrei Betlen	ad963a0961	Bump version	2023-11-28 04:58:20 -05:00
Andrei Betlen	e3941d9c67	Make building llava optional	2023-11-28 04:55:21 -05:00
Andrei Betlen	7f3704b896	Bump version	2023-11-27 19:14:25 -05:00
Andrei Betlen	396dbf0b2b	docs: Improve low-level docstrings	2023-11-27 19:03:02 -05:00
Andrei Betlen	a928893d03	Merge branch 'main' of github.com:abetlen/llama_cpp_python into main	2023-11-26 15:57:13 -05:00
Andrei Betlen	6308f21d5e	docs: Update Llama docs	2023-11-26 15:56:40 -05:00
Gardner Bickford	c2d63a7148	fix: Typo in the Open Orca chat format #874 (#947 )	2023-11-26 15:39:18 -05:00
Andrei Betlen	f03a38e62a	Update llama.cpp	2023-11-26 15:38:22 -05:00
Andrei Betlen	1a7bf2037b	docs: Update openapi endpoint names	2023-11-24 03:39:29 -05:00
Andrei Betlen	4026166e68	docs: Update completion and chat_completion parameter docstrings	2023-11-24 03:24:19 -05:00
Andrei Betlen	8c3aa7858b	Merge branch 'main' of github.com:abetlen/llama_cpp_python into main	2023-11-24 00:15:36 -05:00
Andrei Betlen	de2e2bc083	misc fix verbose printing in functionary model	2023-11-23 20:14:23 -05:00
Andrei Betlen	36048d46af	Update llama.cpp	2023-11-23 16:26:00 -05:00
mrfakename	d68fc07b1b	Add Zephyr format (#937 )	2023-11-23 01:20:08 -05:00
caiyesd	4184835078	Add chat format to support baichuan (#938 ) Signed-off-by: caiyesd <caiyesd@gmail.com>	2023-11-23 01:19:50 -05:00
Andrei Betlen	c647f01609	Add from_json_schema to LlamaGrammar	2023-11-23 00:27:00 -05:00
Andrei Betlen	be1f64d569	docs: Add docstrings from llama.cpp	2023-11-23 00:26:26 -05:00
Andrei Betlen	b6bb7ac76a	docs: Add Llama class example	2023-11-22 23:10:04 -05:00
caiyesd	b8f29f4bf0	Add baichuan-2 chat format (#936 ) Signed-off-by: caiyesd <caiyesd@gmail.com>	2023-11-22 06:08:06 -05:00
Andrei Betlen	8b6ca22846	Fix type warnings for json schema grammar converter	2023-11-21 13:32:00 -05:00
Andrei Betlen	230fc8b535	Bump version	2023-11-21 05:04:55 -05:00
Andrei Betlen	128dc4731f	Fix #569	2023-11-21 04:39:05 -05:00
Andrei Betlen	7a3f87846b	Format	2023-11-21 04:02:20 -05:00
Andrei Betlen	422ebc89ce	Fix: Add logit_bias to all completion api methods	2023-11-21 04:01:36 -05:00
Andrei Betlen	07e47f55ba	Add support for logit_bias outside of server api. Closes #827	2023-11-21 03:59:46 -05:00
Maarten ter Huurne	c21edb6908	Do not set `grammar` to `None` for new `LlamaGrammar` objects (#834 ) * Do not set `grammar` to `None` for new `LlamaGrammar` objects The `grammar` attribute is written by `init()`, but that method always returns `None`, so `__init__()` would then discard the previously written object. * Add minimal test for grammar parsing	2023-11-21 00:23:18 -05:00
mrfakename	ef65fc5ff4	Add MistralLite, Intel, and OpenChat prompt formats (#927 ) * Add MistralLite format * Update llama_chat_format.py * Update llama_chat_format.py	2023-11-21 00:19:25 -05:00
TK-Master	b8438f70b5	Added support for min_p (#921 ) * Added support for min_p My small contribution to this great project. Ref: https://github.com/ggerganov/llama.cpp/pull/3841 Closes: https://github.com/abetlen/llama-cpp-python/issues/911 * Fix for negative temp (sample_softmax)	2023-11-20 23:21:33 -05:00
Andrei Betlen	a34d480141	Fix #929	2023-11-20 22:50:59 -05:00
Andrei Betlen	2c2afa320f	Update llama.cpp	2023-11-20 14:11:33 -05:00
Andrei Betlen	f2901d840e	Bump version	2023-11-14 14:10:00 -05:00
Andrei Betlen	01846a76b9	Bump version	2023-11-10 16:36:12 -05:00
Andrei Betlen	b7e60b66f4	Bump version	2023-11-10 06:21:24 -05:00
Andrei Betlen	6f0b0b1b84	Fix sampling bug when logits_all=False	2023-11-10 05:15:41 -05:00
Andrei Betlen	d9b38e3e3a	Potential bugfix for eval	2023-11-10 04:41:19 -05:00
Andrei Betlen	b84d76a844	Fix: add default stop sequence to chatml chat format	2023-11-10 04:24:48 -05:00
Andrei Betlen	1b376c62b7	Update functionary for new OpenAI API	2023-11-10 02:51:58 -05:00
Andrei Betlen	17da8fb446	Add missing tool_calls finish_reason	2023-11-10 02:51:06 -05:00
Andrei Betlen	770df34436	Add $ref and $defs support to json schema converter	2023-11-10 02:50:46 -05:00
Andrei Betlen	faeae181b1	Fix: json_schema_to_gbnf should take string dump of json schema as input	2023-11-10 02:50:17 -05:00
Andrei Betlen	e7962d2c73	Fix: default max_tokens matches openai api (16 for completion, max length for chat completion)	2023-11-10 02:49:27 -05:00
Andrei Betlen	b62c449839	Bugfix: missing response_format for functionary and llava chat handlers	2023-11-09 00:55:23 -05:00
Andrei Betlen	fd41ed3a90	Add set_seed to Llama class	2023-11-08 11:09:41 -05:00
Andrei Betlen	ca4cb88351	Fix destructor NoneType is not callable error	2023-11-08 11:05:45 -05:00
Andrei Betlen	01cb3a0381	Bump version	2023-11-08 00:54:54 -05:00
Andrei Betlen	b30b9c338b	Add JSON mode support. Closes #881	2023-11-08 00:07:16 -05:00
Andrei Betlen	4852a6a39c	Fix built in GBNF grammar rules	2023-11-08 00:06:22 -05:00
Andrei Betlen	64f5153c35	Add seed parameter to chat handlers	2023-11-07 23:41:29 -05:00
Andrei Betlen	86aeb9f3a1	Add seed parameter support for completion and chat_completion requests. Closes #884	2023-11-07 23:37:28 -05:00
Damian Stewart	aab74f0b2b	Multimodal Support (Llava 1.5) (#821 ) * llava v1.5 integration * Point llama.cpp to fork * Add llava shared library target * Fix type * Update llama.cpp * Add llava api * Revert changes to llama and llama_cpp * Update llava example * Add types for new gpt-4-vision-preview api * Fix typo * Update llama.cpp * Update llama_types to match OpenAI v1 API * Update ChatCompletionFunction type * Reorder request parameters * More API type fixes * Even More Type Updates * Add parameter for custom chat_handler to Llama class * Fix circular import * Convert to absolute imports * Fix * Fix pydantic Jsontype bug * Accept list of prompt tokens in create_completion * Add llava1.5 chat handler * Add Multimodal notebook * Clean up examples * Add server docs --------- Co-authored-by: Andrei Betlen <abetlen@gmail.com>	2023-11-07 22:48:51 -05:00
Andrei Betlen	56171cf7bf	Bump version	2023-11-06 09:37:55 -05:00
Andrei Betlen	be0add1b2d	Fix type bug	2023-11-06 09:30:38 -05:00
Andrei Betlen	e214a58422	Refactor Llama class internals	2023-11-06 09:16:36 -05:00
Andrei Betlen	bbffdaebaa	Refactor autotokenizer format to reusable function	2023-11-06 09:07:27 -05:00
Joe	4ff8def4d0	#717 : Add support for Huggingface Autotokenizer (#790 ) Co-authored-by: Andrei <abetlen@gmail.com>	2023-11-05 18:06:36 -05:00
earonesty	3580e2c5df	Update llama_chat_format.py (#869 ) * Update llama_chat_format.py properly formal llama2 with first-message prompt embedded * Update llama_chat_format.py	2023-11-05 17:00:13 -05:00
Andrei Betlen	f0b30ef7dc	Update llama.cpp	2023-11-05 16:57:10 -05:00
Andrei Betlen	2ec043af76	Clean up stdout / stderr suppression	2023-11-03 13:02:15 -04:00
Andrei Betlen	4ea7027c41	Rename internal only module utils to _utils	2023-11-03 12:55:55 -04:00
Andrei Betlen	df9362eeea	Update llama.cpp	2023-11-03 11:34:50 -04:00
Andrei	3af7b21ff1	Add functionary support (#784 ) * Add common grammars and json-schema-to-grammar utility function from llama.cpp * Pass functions to format function * Add basic functionary formatting * Add LlamaChatHandler for more complex chat use cases * Add function calling example notebook * Add support for regular chat completions alongside function calling	2023-11-03 02:12:14 -04:00
Andrei	ab028cb878	Migrate inference to llama_batch and llama_decode api (#795 ) * Add low-level batching notebook * fix: tokenization of special characters: (#850) It should behave like llama.cpp, where most out of the box usages treat special characters accordingly * Update CHANGELOG * Cleanup * Fix runner label * Update notebook * Use llama_decode and batch api * Support logits_all parameter --------- Co-authored-by: Antoine Lizee <antoine.lizee@gmail.com>	2023-11-02 20:13:57 -04:00
Andrei Betlen	8350de9a18	Bump version	2023-11-02 15:53:01 -04:00
Andrei Betlen	011b95d7f3	Fix name 'open' is not defined exception. Closes #860	2023-11-02 15:30:55 -04:00
Andrei Betlen	fa83cc5f9c	Update llama.cpp Fix build examples Exclude examples directory Revert cmake changes Try actions/checkout@v4 Try to update submodules Revert Update llama.cpp Fix build examples Exclude examples directory Revert cmake changes Try actions/checkout@v4 Try to update submodules Revert	2023-11-02 14:28:15 -04:00
Antoine Lizee	4d4e0f11e2	fix: tokenization of special characters: (#850 ) It should behave like llama.cpp, where most out of the box usages treat special characters accordingly	2023-11-02 14:28:14 -04:00
Andrei Betlen	6b3aa7fc8f	Bump version	2023-11-01 19:25:03 -04:00
Sujeendran Menon	7b136bb5b1	Fix for shared library not found and compile issues in Windows (#848 ) * fix windows library dll name issue * Updated README.md Windows instructions * Update llama_cpp.py to handle different windows dll file versions	2023-11-01 18:55:57 -04:00
cebtenzzre	eefd76fe81	llama: fix exception in Llama.__del__ (#846 )	2023-11-01 18:53:57 -04:00
David Ponce	3fc9147218	Iterate over tokens that should be biased rather than the entire vocabulary. (#851 )	2023-11-01 18:53:47 -04:00
Marko Tasic	9c8f4dca5f	fixed Llama._create_completion suffix check, it can be either None or str instance (#854 )	2023-11-01 18:52:50 -04:00
Daniel Thuerck	5f8f369d1b	Pass-Through grammar parameter in web server. (#855 ) Closes #778	2023-11-01 18:51:12 -04:00
Adam Katora	25cb710281	Update llama_types.py (#849 ) Minor typo fix, funcion -> function	2023-11-01 18:50:11 -04:00
Andrei Betlen	d808fd436c	Update llama.cpp	2023-10-31 21:29:35 -04:00
Andrei Betlen	53861c9e53	Update llama.cpp	2023-10-24 03:13:32 -04:00
gmcgoldr	09a8406c83	Fix streaming doesn't return finish reason (#798 ) When streaming the yield that contains the finish can be skipped. This change ensures that yield isn't skipped.	2023-10-19 02:55:56 -04:00
Andrei Betlen	28c2b884e2	Merge branch 'main' of github.com:abetlen/llama_cpp_python into main	2023-10-19 02:55:31 -04:00
Andrei Betlen	ff580031d2	Update llama.cpp	2023-10-19 02:55:08 -04:00
Xiaoyu Kevin Hu	a315128d66	update value check for n_gpu_layers field (#826 )	2023-10-18 18:25:25 -04:00
Pierre Alexandre SCHEMBRI	10304d75fc	Make use of suppress_stdout_stderr when freeing model (#803 )	2023-10-15 13:52:43 -04:00
Ma, Guokai	a1ac199980	Fix repeat greeting (#808 ) * fix repeated greeting * remove seperator between role and message	2023-10-15 13:52:21 -04:00
Eric Liu	b50166500e	Add validation for tensor_split size exceeding LLAMA_MAX_DEVICES (#820 ) * Add validation for tensor_split size exceeding LLAMA_MAX_DEVICES * reword	2023-10-15 13:51:51 -04:00
Andrei Betlen	d6a130a052	Print traceback on server error	2023-10-10 15:56:04 -04:00
Andrei Betlen	43dfe1e2ab	Update llama.cpp	2023-10-05 16:07:49 -04:00
Andrei Betlen	a7d17b8ac9	Update llama.cpp	2023-10-03 15:23:35 -04:00
Andrei Betlen	305482bd41	Add chatml chat format	2023-09-30 21:01:34 -04:00
Andrei Betlen	5ef5280ef9	Log server exceptions to stdout	2023-09-30 19:13:36 -04:00
Andrei Betlen	fab4bccc35	Bump version	2023-09-30 16:04:46 -04:00
Andrei Betlen	d696251fbe	Fix logits_all bug	2023-09-30 16:02:35 -04:00
Andrei Betlen	6ee413d79e	Bump version	2023-09-30 13:23:09 -04:00
Andrei Betlen	42bb721d64	Fix bug in embedding	2023-09-30 13:20:22 -04:00
Andrei Betlen	5d62d55a82	Bump version	2023-09-30 00:07:06 -04:00
Andrei Betlen	386c88b68e	Bump version	2023-09-29 20:07:31 -04:00
Andrei Betlen	d9bce17794	Update server params	2023-09-29 19:59:12 -04:00
Andrei Betlen	3720c739d4	Update llama.cpp	2023-09-29 19:58:21 -04:00
Andrei	3bca7708fb	Configurable Chat Formats (#711 ) * Add configurable default chat completion format. * Remove chat_template file to avoid circular import * Update llama_types * Add chat format	2023-09-29 19:52:04 -04:00
Josh XT	a945404b4a	Fix rope scaling defaults (#767 ) * Fix rope scale with backwards compatibility * Fix defaults * Fix op * Remove backwards compatibility * Check single val	2023-09-29 16:03:57 -04:00
Andrei Betlen	1a1c3dc418	Update llama.cpp	2023-09-28 22:42:03 -04:00
Andrei Betlen	4177ae6d34	Bump version	2023-09-25 14:38:38 -04:00
Viacheslav/Slava Tradunsky	3d5e5b1c04	Adds openai-processing-ms response header (#748 )	2023-09-25 13:55:58 -04:00
Andrei Betlen	dbca136fea	Update llama_types and names to match openai api	2023-09-20 15:38:26 -04:00
Andrei Betlen	38e34c97f0	Update llama.cpp	2023-09-18 16:11:27 -04:00
Andrei Betlen	8d75016549	Install required runtime dlls to package directory on windows	2023-09-16 14:57:49 -04:00
Andrei Betlen	acf18fcdf0	Bump version	2023-09-15 14:22:21 -04:00
Andrei Betlen	b047b3034e	Remove confusing helpstring from server cli args. Closes #719	2023-09-15 14:09:43 -04:00
Andrei Betlen	24fec0b242	Bump version	2023-09-14 18:33:08 -04:00
Andrei Betlen	8474665625	Update base_path to fix issue resolving dll in windows isolation container.	2023-09-14 14:51:43 -04:00
Andrei Betlen	507bcc7171	Bump version	2023-09-13 23:15:23 -04:00
Andrei Betlen	0449d29b9f	Fix boolean env vars and cli arguments	2023-09-13 23:09:57 -04:00
earonesty	58a6e42cc0	Update app.py (#705 )	2023-09-13 23:01:34 -04:00
Andrei Betlen	f4090a0bb2	Add numa support, low level api users must now explicitly call llama_backend_init at the start of their programs.	2023-09-13 23:00:43 -04:00
Andrei Betlen	c999325e8e	Fix boolean cli flags	2023-09-13 22:56:10 -04:00
Andrei Betlen	4daf77e546	Format	2023-09-13 21:23:23 -04:00
Andrei Betlen	2920c4bf7e	Update server params. Added lora_base, lora_path, low_vram, and main_gpu. Removed rms_norm_eps and n_gqa (deprecated in llama.cpp)	2023-09-13 21:23:13 -04:00
Andrei Betlen	6a20293fc2	Reorder init params to match llama.cpp order	2023-09-13 21:20:26 -04:00
Andrei Betlen	c8f9b8a734	Explicitly make all init params other than model_path into keyword only params	2023-09-13 21:19:47 -04:00
Andrei Betlen	a68f9e2791	Add kwargs to init to catch extra params	2023-09-13 21:19:02 -04:00
Andrei Betlen	9e345a47a2	remove print	2023-09-13 21:12:27 -04:00
Andrei Betlen	517f9ed80b	Convert missed llama.cpp constants into standard python types	2023-09-13 21:11:52 -04:00
Andrei Betlen	c4c440ba2d	Fix tensor_split cli option	2023-09-13 20:00:42 -04:00
Andrei Betlen	203ede4ba2	Bump version	2023-09-13 18:07:08 -04:00
Andrei Betlen	759405c84b	Fix issue with Literal and Optional cli arguments not working. Closes #702	2023-09-13 18:06:12 -04:00
Devrim	da9df78db0	Add X-Request-ID request header for mirroring custom IDs. (#703 )	2023-09-13 16:18:31 -04:00
Andrei Betlen	8e13520796	Bump version	2023-09-13 01:47:58 -04:00
Andrei Betlen	2787663a25	Bump version	2023-09-12 21:00:01 -04:00
Andrei Betlen	6e89775759	Bump version	2023-09-12 18:57:01 -04:00
Andrei Betlen	bb4e67e7aa	Using dynamic version	2023-09-12 18:56:36 -04:00
Andrei Betlen	1910793f56	Merge branch 'main' into v0.2-wip	2023-09-12 16:43:32 -04:00
Andrei Betlen	c7901f1141	Bump version	2023-09-12 16:16:40 -04:00
janvdp	33ce931cce	merge upstream	2023-09-09 21:21:04 +02:00
Andrei Betlen	d3f63211ef	Update llama.cpp	2023-09-09 12:12:32 -04:00
janvdp	da0fdafc32	import version in __init__.py	2023-09-05 21:09:28 +02:00
janvdp	6e8e64d09a	add version file	2023-09-05 21:09:08 +02:00
Andrei Betlen	186626d58e	Update llama.cpp	2023-09-01 14:26:13 -04:00
Andrei Betlen	47de3ab104	Update llama.cpp	2023-08-29 07:36:20 -04:00
Andrei Betlen	3f76e1de52	cjk pr minor cleanup	2023-08-29 07:21:59 -04:00
Andrei	bae44ec8bf	Merge pull request #309 from MeouSker77/fix-CJK Fix CJK and emoji stream output	2023-08-29 06:58:10 -04:00
Andrei Betlen	e0dcbc28a1	Update llama.cpp	2023-08-28 10:33:45 -04:00
Andrei Betlen	4887973c22	Update llama.cpp	2023-08-27 12:59:20 -04:00
Andrei Betlen	3a29d65f45	Update llama.cpp	2023-08-26 23:36:24 -04:00

... 3 4 5 6 7 ...

824 commits