baalajimaestro/llama.cpp

Author	SHA1	Message	Date
Andrei Betlen	3322eadbf3	Bump version	2024-01-31 15:10:18 -05:00
Andrei	fb762a6041	Add speculative decoding (#1120 ) * Add draft model param to llama class, implement basic prompt lookup decoding draft model * Use samplingcontext for sampling * Use 1d array * Use draft model for sampling * Fix dumb mistake * Allow for later extensions to the LlamaDraftModel api * Cleanup * Adaptive candidate prediction * Update implementation to match hf transformers * Tuning * Fix bug where last token was not used for ngram prediction * Remove heuristic for num_pred_tokens (no benefit) * fix: n_candidates bug. * Add draft_model_num_pred_tokens server setting * Cleanup * Update README	2024-01-31 14:08:14 -05:00
Andrei Betlen	71e3e4c435	Update llama.cpp	2024-01-31 10:41:42 -05:00
Andrei Betlen	078cca0361	fix: Pass raise_exception and add_generation_prompt to jinja2 chat template	2024-01-31 08:42:21 -05:00
Andrei Betlen	bf9e824922	Bump version	2024-01-30 12:27:27 -05:00
Andrei Betlen	011cd84ded	Update llama.cpp	2024-01-30 09:48:09 -05:00
Andrei	da003d8768	Automatically set chat format from gguf (#1110 ) * Use jinja formatter to load chat format from gguf * Fix off-by-one error in metadata loader * Implement chat format auto-detection	2024-01-29 14:22:23 -05:00
Andrei Betlen	464af5b39f	Bump version	2024-01-29 10:46:04 -05:00
Andrei Betlen	9ae5819ee4	Add chat format test.	2024-01-29 00:59:01 -05:00
Rafaelblsilva	ce38dbdf07	Add mistral instruct chat format as "mistral-instruct" (#799 ) * Added mistral instruct chat format as "mistral" * Fix stop sequence (merge issue) * Update chat format name to `mistral-instruct` --------- Co-authored-by: Andrei <abetlen@gmail.com>	2024-01-29 00:34:42 -05:00
Andrei Betlen	52c4a84faf	Bump version	2024-01-28 19:35:37 -05:00
Andrei Betlen	c1d0fff8a9	Bump version	2024-01-27 18:36:56 -05:00
Andrei	d8f6914f45	Add json schema mode (#1122 ) * Add json schema mode * Add llava chat format support	2024-01-27 16:52:18 -05:00
Andrei Betlen	35918873b4	Update llama.cpp	2024-01-26 11:45:48 -05:00
Andrei Betlen	f5cc6b3053	Bump version	2024-01-25 11:28:16 -05:00
Andrei Betlen	cde7514c3d	feat(server): include llama-cpp-python version in openapi spec	2024-01-25 11:23:18 -05:00
Andrei Betlen	c970d41a85	fix: llama_log_set should be able to accept null pointer	2024-01-24 10:38:30 -05:00
Andrei Betlen	9677a1f2c8	fix: Check order	2024-01-23 22:28:03 -05:00
Andrei Betlen	4d6b2f7b91	fix: format	2024-01-23 22:08:27 -05:00
Phil H	fe5d6ea648	fix: GGUF metadata KV overrides, re #1011 (#1116 ) * kv overrides another attempt * add sentinel element, simplify array population * ensure sentinel element is zeroed	2024-01-23 22:00:38 -05:00
Andrei Betlen	fcdf337d84	Update llama.cpp	2024-01-22 11:25:11 -05:00
Andrei Betlen	5b982d0f8c	fix: use both eos and bos tokens as stop sequences for hf-tokenizer-config chat format.	2024-01-22 08:32:48 -05:00
Andrei Betlen	2ce0b8aa2c	Bump version	2024-01-21 20:30:24 -05:00
Andrei Betlen	d3f5528ca8	fix: from_json_schema oneof/anyof bug. Closes #1097	2024-01-21 19:06:53 -05:00
Andrei Betlen	24f39454e9	fix: pass chat handler not chat formatter for huggingface autotokenizer and tokenizer_config formats.	2024-01-21 18:38:04 -05:00
Andrei Betlen	7f3209b1eb	feat: Add add_generation_prompt option for jinja2chatformatter.	2024-01-21 18:37:24 -05:00
Andrei Betlen	be09318c26	feat: Add Jinja2ChatFormatter	2024-01-19 15:04:42 -05:00
Andrei Betlen	5a34c57e54	feat: Expose gguf model metadata in metadata property	2024-01-19 10:46:03 -05:00
Andrei Betlen	833a7f1a86	Bump version	2024-01-19 09:03:35 -05:00
Andrei Betlen	3babe3512c	Fix mirostat sampling	2024-01-19 08:31:59 -05:00
Andrei Betlen	141293a75b	Fix python3.8 support	2024-01-19 08:17:49 -05:00
Andrei Betlen	656f3d8968	Bump version	2024-01-18 21:30:36 -05:00
Andrei Betlen	89cce50f8c	Update llama.cpp	2024-01-18 21:21:49 -05:00
Andrei Betlen	b8fc1c7d83	feat: Add ability to load chat format from huggingface autotokenizer or tokenizer_config.json files.	2024-01-18 21:21:37 -05:00
Andrei Betlen	48c3b77e6f	Offload KQV by default	2024-01-18 11:08:57 -05:00
Austin	6bfe98bd80	Integration of Jinja2 Templating (#875 ) * feat: Add support for jinja templating Signed-off-by: teleprint-me <77757836+teleprint-me@users.noreply.github.com> * fix: Refactor chat formatter and update interface for jinja templates - Simplify the `llama2_template` in `llama_jinja_format.py` by removing unnecessary line breaks for readability without affecting functionality. - Update `ChatFormatterInterface` constructor to accept a more generic `Optional[object]` type for the template parameter, enhancing flexibility. - Introduce a `template` property to `ChatFormatterInterface` for standardized access to the template string. - Replace `MetaSingleton` metaclass with `Singleton` for the `ChatFormatterFactory` to streamline the singleton implementation. These changes enhance code readability, maintain usability, and ensure consistency in the chat formatter's design pattern usage. * Add outline for Jinja2 templating integration documentation Signed-off-by: teleprint-me <77757836+teleprint-me@users.noreply.github.com> * Add jinja2 as a dependency with version range for Hugging Face transformers compatibility Signed-off-by: teleprint-me <77757836+teleprint-me@users.noreply.github.com> * Update jinja2 version constraint for mkdocs-material compatibility Signed-off-by: teleprint-me <77757836+teleprint-me@users.noreply.github.com> * Fix attribute name in AutoChatFormatter - Changed attribute name from `self._renderer` to `self._environment` --------- Signed-off-by: teleprint-me <77757836+teleprint-me@users.noreply.github.com>	2024-01-17 09:47:52 -05:00
Andrei Betlen	7b46bb5a78	Re-order classes in llama.py	2024-01-17 09:16:13 -05:00
Andrei Betlen	cc4630e66f	Move helper classes to _internals submodule	2024-01-17 09:14:00 -05:00
Andrei Betlen	3b92419132	Move cache classes to llama_cache submodule.	2024-01-17 09:09:12 -05:00
Kyle Mistele	9c36688b33	fix(cli): allow passing n_ctx=0 to openAI API server args to use model n_ctx_train field per #1015 (#1093 )	2024-01-16 18:54:06 -05:00
anil	cfb7da98ed	Support Accept text/event-stream in chat and completion endpoints, resolves #1083 (#1088 ) Co-authored-by: Anil Pathak <anil@heyday.com> Co-authored-by: Andrei Betlen <abetlen@gmail.com>	2024-01-16 12:52:52 -05:00
Andrei Betlen	4b11fa83c0	Bump version	2024-01-15 12:54:51 -05:00
Andrei Betlen	84615adbc6	Add split_mode option. Closes #1085	2024-01-15 12:49:20 -05:00
Phil H	76aafa6149	Implement GGUF metadata KV overrides (#1011 ) * Implement GGUF metadata overrides * whitespace fix * Fix kv overrides. * Fix pointer and pickle * Match llama.cpp kv_overrides cli argument --------- Co-authored-by: Andrei <abetlen@gmail.com>	2024-01-15 12:29:29 -05:00
yieldthought	7eff42c239	Avoid "LookupError: unknown encoding: ascii" when open() called in a destructor (#1012 ) The existing code often causes "LookupError: unknown encoding: ascii" when open() called in a destructor. Saving open in self.open is not enough to avoid this. Instead, we can avoid reopening /dev/null every time by doing it once when the module is loaded.	2024-01-15 10:52:10 -05:00
Mark Neumann	c689ccc728	Fix Pydantic model parsing (#1087 )	2024-01-15 10:45:57 -05:00
Andrei Betlen	5502ac8876	Update llama.cpp	2024-01-15 10:12:10 -05:00
Andrei Betlen	359ae73643	Update llama.cpp	2024-01-14 08:17:22 -05:00
Andrei Betlen	7c898d5684	Update llama.cpp	2024-01-13 22:37:49 -05:00
Andrei Betlen	bb610b9428	Update llama.cpp	2024-01-11 22:51:12 -05:00
Andrei Betlen	f0159663d9	Bump version	2024-01-10 02:51:17 -05:00
Stephen Hankinson	df3be58d6c	Add ability to pass in penalize_nl param (#1068 )	2024-01-10 02:46:27 -05:00
Joseph Turian	2ddce7294e	print_grammar to stderr (#1052 )	2024-01-10 02:46:03 -05:00
Andrei Betlen	1ae05c102b	Update llama.cpp	2024-01-08 14:51:29 -05:00
Andrei Betlen	75d0527fd7	Bump version	2024-01-04 18:30:12 -05:00
Fedor Moiseev	907b9e9d42	Add Saiga chat format. (#1050 )	2024-01-04 18:12:58 -05:00
xaviviro	cf743ec5d3	Added ChatGLM chat format (#1059 ) Co-authored-by: Xavier Vinaixa Rosello <xaviviro@MacBook-Pro-de-Xavier.local>	2024-01-04 18:12:02 -05:00
Andrei Betlen	eb9c7d4ed8	Update llama.cpp	2024-01-03 22:04:04 -05:00
Andrei Betlen	011c3630f5	Bump version	2023-12-27 17:35:02 -05:00
Andrei Betlen	92284f32cb	Add HIP_PATH to dll search directories for windows users.	2023-12-22 15:29:56 -05:00
Andrei Betlen	2b0d3f36fa	set llama_max_devices using library function	2023-12-22 15:19:28 -05:00
Andrei Betlen	d9a1d90fd7	Fix typo	2023-12-22 15:12:27 -05:00
Andrei Betlen	37556bf9c4	Bump version	2023-12-22 14:55:58 -05:00
Andrei Betlen	6d8bc090f9	fix: inccorect bindings for kv override. Based on #1011	2023-12-22 14:52:20 -05:00
Andrei Betlen	522aecb868	docs: add server config docs	2023-12-22 14:37:24 -05:00
Andrei Betlen	6473796343	Update llama.cpp	2023-12-22 14:10:34 -05:00
swg	4b01a873ef	server: Support none defaulting to infinity for completions (#111 ) * Support defaulting to infinity or -1 for chat completions * Check if completion_tokens is none in error handler. * fix: max_tokens in create completion should match openai spec * Fix __call__ --------- Co-authored-by: Andrei Betlen <abetlen@gmail.com>	2023-12-22 14:05:13 -05:00
Dave	12b7f2f4e9	[Feat] Multi model support (#931 ) * Update Llama class to handle chat_format & caching * Add settings.py * Add util.py & update __main__.py * multimodel * update settings.py * cleanup * delete util.py * Fix /v1/models endpoint * MultiLlama now iterable, app check-alive on "/" * instant model init if file is given * backward compability * revert model param mandatory * fix error * handle individual model config json * refactor * revert chathandler/clip_model changes * handle chat_handler in MulitLlama() * split settings into server/llama * reduce global vars * Update LlamaProxy to handle config files * Add free method to LlamaProxy * update arg parsers & install server alias * refactor cache settings * change server executable name * better var name * whitespace * Revert "whitespace" This reverts commit bc5cf51c64a95bfc9926e1bc58166059711a1cd8. * remove exe_name * Fix merge bugs * Fix type annotations * Fix type annotations * Fix uvicorn app factory * Fix settings * Refactor server * Remove formatting fix * Format * Use default model if not found in model settings * Fix * Cleanup * Fix * Fix * Remove unnused CommandLineSettings * Cleanup * Support default name for copilot-codex models --------- Co-authored-by: Andrei Betlen <abetlen@gmail.com>	2023-12-22 05:51:25 -05:00
Andrei Betlen	4a85442c35	Update llama.cpp	2023-12-22 00:12:37 -05:00
twaka	2f03fb0231	fix text_offset of multi-token characters (#1037 ) * fix text_offsets for bytes tokens * fix	2023-12-22 00:03:29 -05:00
docmeth02	33cc623346	Implement openai api compatible authentication (#1010 )	2023-12-21 13:44:49 -05:00
Andrei Betlen	a05b4da80a	fix: float32 is not JSON serializable when streaming logits.	2023-12-18 18:40:36 -05:00
Andrei Betlen	7df6c32544	Fix type annotations	2023-12-18 18:14:53 -05:00
Andrei Betlen	b703aad79e	Fix type annotation	2023-12-18 18:13:37 -05:00
Andrei Betlen	d0aedfcff6	Fix type annotation	2023-12-18 18:12:49 -05:00
Eduard Christian Dumitrescu	2993936b10	Fix ctypes definitions of `llama_kv_cache_view_update` and `llama_kv_cache_view_free`. (#1028 )	2023-12-18 18:11:26 -05:00
Andrei Betlen	5e863d8a3b	Bump version	2023-12-18 16:09:18 -05:00
Andrei Betlen	095c650006	Add offload_kqv option to llama and server	2023-12-18 15:36:09 -05:00
Andrei Betlen	472b344ae3	Remove unnused import	2023-12-18 15:32:40 -05:00
kddubey	6b2e0e05b4	perf: Don't convert logprobs arrays to lists (#1021 )	2023-12-18 14:28:12 -05:00
Brandon Roberts	62944df142	Bugfix: Remove f16_kv, add offload_kqv field (#1019 ) F16_KV appears to have been removed here: `af99c6fbfc` This addresses two issues: - #995 which just requests to add the KV cache offloading param - #1006 a NULL ptr exception when using the embeddings (introduced by leaving f16_kv in the fields struct)	2023-12-18 14:27:11 -05:00
Daniele Morotti	f1c631dc53	Bug fixed with n_ctx=0 (#1015 ) If the n_ctx is set to 0 the code should use the maximum context length of the selected model, but it didn't work. There was a problem with the initialization of this parameter and a related problem with 'n_batch'.	2023-12-16 18:59:50 -05:00
kddubey	5a8944672f	Fix logits_to_logprobs for 2-D and 3-D logits (#1002 ) * Fix logits_to_logprobs for 2-D and 3-D logits * Set dtype to single * Test size	2023-12-16 18:59:26 -05:00
Andrei Betlen	534b1ea9b5	Update llama.cpp	2023-12-16 18:57:43 -05:00
Andrei Betlen	cbce061ffd	Bump version	2023-12-13 21:52:29 -05:00
yhfgyyf	8b4db732bd	Add qwen chat format (#1005 )	2023-12-13 21:43:43 -05:00
Andrei Betlen	690c563b60	Merge branch 'main' of github.com:abetlen/llama_cpp_python into main	2023-12-13 21:43:19 -05:00
Andrei Betlen	c0fc0a1e82	Update llama.cpp	2023-12-13 21:43:16 -05:00
Radoslav Gerganov	8e44a32075	Add support for running the server with SSL (#994 )	2023-12-11 20:47:11 -05:00
Tanner Hobson	ef22e478db	Replace logits_to_logprobs implementation with numpy equivalent to llama.cpp (#991 ) See #990. This change makes the logits_to_logprobs function equivalent to the version in the llama.cpp repository. It uses numpy so it's much faster than the previous version.	2023-12-11 20:46:27 -05:00
zocainViken	ac35f68e4d	Fix UnsupportedOperation: fileno in suppress_stdout_stderr (#961 ) * bug fixing * llava from readme got this error: UnsupportedOperation: fileno quick fix by checking hasattr * multi modal params fix: add logits = True -> to make llava work * multi modal params fix: add logits = True -> to make llava work --------- Co-authored-by: Andrei <abetlen@gmail.com>	2023-12-11 20:44:51 -05:00
chiensen	b938cccf05	Add Pygmalion chat format (#986 )	2023-12-11 20:44:04 -05:00
Andrei Betlen	c1e73e73a3	Bump version	2023-12-11 10:26:42 -05:00
Andrei Betlen	ec26f364cc	Remove f16_kv	2023-12-11 10:25:37 -05:00
Andrei Betlen	f1edc66b21	Update llama.cpp	2023-12-11 10:21:35 -05:00
kddubey	b069d06346	Fix #891 (#952 )	2023-11-29 05:39:52 -05:00
Andrei Betlen	ad963a0961	Bump version	2023-11-28 04:58:20 -05:00
Andrei Betlen	e3941d9c67	Make building llava optional	2023-11-28 04:55:21 -05:00
Andrei Betlen	7f3704b896	Bump version	2023-11-27 19:14:25 -05:00
Andrei Betlen	396dbf0b2b	docs: Improve low-level docstrings	2023-11-27 19:03:02 -05:00

1 2 3 4 5 ...

660 commits