baalajimaestro/llama.cpp

Author	SHA1	Message	Date
Andrei Betlen	59760c85ed	fix: Use llama_log_callback to avoid suppress_stdout_stderr	2024-02-05 21:52:12 -05:00
Andrei Betlen	3553b14670	Update llama.cpp	2024-02-05 13:26:50 -05:00
Andrei	7467f129e5	Revert "Fix: fileno error google colab (#729 ) (#1156 )" (#1157 ) This reverts commit `bebfba0f08`.	2024-02-02 12:18:55 -05:00
Dulsara	bebfba0f08	Fix: fileno error google colab (#729 ) (#1156 ) Instead of using a devnull just made a dummy class with a 'write()' method that does nothing.	2024-02-02 12:05:46 -05:00
Andrei Betlen	3322eadbf3	Bump version	2024-01-31 15:10:18 -05:00
Andrei	fb762a6041	Add speculative decoding (#1120 ) * Add draft model param to llama class, implement basic prompt lookup decoding draft model * Use samplingcontext for sampling * Use 1d array * Use draft model for sampling * Fix dumb mistake * Allow for later extensions to the LlamaDraftModel api * Cleanup * Adaptive candidate prediction * Update implementation to match hf transformers * Tuning * Fix bug where last token was not used for ngram prediction * Remove heuristic for num_pred_tokens (no benefit) * fix: n_candidates bug. * Add draft_model_num_pred_tokens server setting * Cleanup * Update README	2024-01-31 14:08:14 -05:00
Andrei Betlen	71e3e4c435	Update llama.cpp	2024-01-31 10:41:42 -05:00
Andrei Betlen	078cca0361	fix: Pass raise_exception and add_generation_prompt to jinja2 chat template	2024-01-31 08:42:21 -05:00
Andrei Betlen	bf9e824922	Bump version	2024-01-30 12:27:27 -05:00
Andrei Betlen	011cd84ded	Update llama.cpp	2024-01-30 09:48:09 -05:00
Andrei	da003d8768	Automatically set chat format from gguf (#1110 ) * Use jinja formatter to load chat format from gguf * Fix off-by-one error in metadata loader * Implement chat format auto-detection	2024-01-29 14:22:23 -05:00
Andrei Betlen	464af5b39f	Bump version	2024-01-29 10:46:04 -05:00
Andrei Betlen	9ae5819ee4	Add chat format test.	2024-01-29 00:59:01 -05:00
Rafaelblsilva	ce38dbdf07	Add mistral instruct chat format as "mistral-instruct" (#799 ) * Added mistral instruct chat format as "mistral" * Fix stop sequence (merge issue) * Update chat format name to `mistral-instruct` --------- Co-authored-by: Andrei <abetlen@gmail.com>	2024-01-29 00:34:42 -05:00
Andrei Betlen	52c4a84faf	Bump version	2024-01-28 19:35:37 -05:00
Andrei Betlen	c1d0fff8a9	Bump version	2024-01-27 18:36:56 -05:00
Andrei	d8f6914f45	Add json schema mode (#1122 ) * Add json schema mode * Add llava chat format support	2024-01-27 16:52:18 -05:00
Andrei Betlen	35918873b4	Update llama.cpp	2024-01-26 11:45:48 -05:00
Andrei Betlen	f5cc6b3053	Bump version	2024-01-25 11:28:16 -05:00
Andrei Betlen	cde7514c3d	feat(server): include llama-cpp-python version in openapi spec	2024-01-25 11:23:18 -05:00
Andrei Betlen	c970d41a85	fix: llama_log_set should be able to accept null pointer	2024-01-24 10:38:30 -05:00
Andrei Betlen	9677a1f2c8	fix: Check order	2024-01-23 22:28:03 -05:00
Andrei Betlen	4d6b2f7b91	fix: format	2024-01-23 22:08:27 -05:00
Phil H	fe5d6ea648	fix: GGUF metadata KV overrides, re #1011 (#1116 ) * kv overrides another attempt * add sentinel element, simplify array population * ensure sentinel element is zeroed	2024-01-23 22:00:38 -05:00
Andrei Betlen	fcdf337d84	Update llama.cpp	2024-01-22 11:25:11 -05:00
Andrei Betlen	5b982d0f8c	fix: use both eos and bos tokens as stop sequences for hf-tokenizer-config chat format.	2024-01-22 08:32:48 -05:00
Andrei Betlen	2ce0b8aa2c	Bump version	2024-01-21 20:30:24 -05:00
Andrei Betlen	d3f5528ca8	fix: from_json_schema oneof/anyof bug. Closes #1097	2024-01-21 19:06:53 -05:00
Andrei Betlen	24f39454e9	fix: pass chat handler not chat formatter for huggingface autotokenizer and tokenizer_config formats.	2024-01-21 18:38:04 -05:00
Andrei Betlen	7f3209b1eb	feat: Add add_generation_prompt option for jinja2chatformatter.	2024-01-21 18:37:24 -05:00
Andrei Betlen	be09318c26	feat: Add Jinja2ChatFormatter	2024-01-19 15:04:42 -05:00
Andrei Betlen	5a34c57e54	feat: Expose gguf model metadata in metadata property	2024-01-19 10:46:03 -05:00
Andrei Betlen	833a7f1a86	Bump version	2024-01-19 09:03:35 -05:00
Andrei Betlen	3babe3512c	Fix mirostat sampling	2024-01-19 08:31:59 -05:00
Andrei Betlen	141293a75b	Fix python3.8 support	2024-01-19 08:17:49 -05:00
Andrei Betlen	656f3d8968	Bump version	2024-01-18 21:30:36 -05:00
Andrei Betlen	89cce50f8c	Update llama.cpp	2024-01-18 21:21:49 -05:00
Andrei Betlen	b8fc1c7d83	feat: Add ability to load chat format from huggingface autotokenizer or tokenizer_config.json files.	2024-01-18 21:21:37 -05:00
Andrei Betlen	48c3b77e6f	Offload KQV by default	2024-01-18 11:08:57 -05:00
Austin	6bfe98bd80	Integration of Jinja2 Templating (#875 ) * feat: Add support for jinja templating Signed-off-by: teleprint-me <77757836+teleprint-me@users.noreply.github.com> * fix: Refactor chat formatter and update interface for jinja templates - Simplify the `llama2_template` in `llama_jinja_format.py` by removing unnecessary line breaks for readability without affecting functionality. - Update `ChatFormatterInterface` constructor to accept a more generic `Optional[object]` type for the template parameter, enhancing flexibility. - Introduce a `template` property to `ChatFormatterInterface` for standardized access to the template string. - Replace `MetaSingleton` metaclass with `Singleton` for the `ChatFormatterFactory` to streamline the singleton implementation. These changes enhance code readability, maintain usability, and ensure consistency in the chat formatter's design pattern usage. * Add outline for Jinja2 templating integration documentation Signed-off-by: teleprint-me <77757836+teleprint-me@users.noreply.github.com> * Add jinja2 as a dependency with version range for Hugging Face transformers compatibility Signed-off-by: teleprint-me <77757836+teleprint-me@users.noreply.github.com> * Update jinja2 version constraint for mkdocs-material compatibility Signed-off-by: teleprint-me <77757836+teleprint-me@users.noreply.github.com> * Fix attribute name in AutoChatFormatter - Changed attribute name from `self._renderer` to `self._environment` --------- Signed-off-by: teleprint-me <77757836+teleprint-me@users.noreply.github.com>	2024-01-17 09:47:52 -05:00
Andrei Betlen	7b46bb5a78	Re-order classes in llama.py	2024-01-17 09:16:13 -05:00
Andrei Betlen	cc4630e66f	Move helper classes to _internals submodule	2024-01-17 09:14:00 -05:00
Andrei Betlen	3b92419132	Move cache classes to llama_cache submodule.	2024-01-17 09:09:12 -05:00
Kyle Mistele	9c36688b33	fix(cli): allow passing n_ctx=0 to openAI API server args to use model n_ctx_train field per #1015 (#1093 )	2024-01-16 18:54:06 -05:00
anil	cfb7da98ed	Support Accept text/event-stream in chat and completion endpoints, resolves #1083 (#1088 ) Co-authored-by: Anil Pathak <anil@heyday.com> Co-authored-by: Andrei Betlen <abetlen@gmail.com>	2024-01-16 12:52:52 -05:00
Andrei Betlen	4b11fa83c0	Bump version	2024-01-15 12:54:51 -05:00
Andrei Betlen	84615adbc6	Add split_mode option. Closes #1085	2024-01-15 12:49:20 -05:00
Phil H	76aafa6149	Implement GGUF metadata KV overrides (#1011 ) * Implement GGUF metadata overrides * whitespace fix * Fix kv overrides. * Fix pointer and pickle * Match llama.cpp kv_overrides cli argument --------- Co-authored-by: Andrei <abetlen@gmail.com>	2024-01-15 12:29:29 -05:00
yieldthought	7eff42c239	Avoid "LookupError: unknown encoding: ascii" when open() called in a destructor (#1012 ) The existing code often causes "LookupError: unknown encoding: ascii" when open() called in a destructor. Saving open in self.open is not enough to avoid this. Instead, we can avoid reopening /dev/null every time by doing it once when the module is loaded.	2024-01-15 10:52:10 -05:00
Mark Neumann	c689ccc728	Fix Pydantic model parsing (#1087 )	2024-01-15 10:45:57 -05:00

1 2 3 4 5 ...

614 commits