baalajimaestro/llama.cpp

Author	SHA1	Message	Date
Andrei Betlen	b1637c2319	Bump version	2024-02-13 12:35:04 -05:00
Andrew Lapp	d6be5333e1	fix: sample idx off-by-one error for logit_processors (#1179 ) * fix sample_idx off-by-one error * self._scores is indexed differently, only modify the index within self._input_ids --------- Co-authored-by: Andrew Lapp <andrew@rew.la> Co-authored-by: Andrei <abetlen@gmail.com>	2024-02-13 12:26:07 -05:00
Andrei Betlen	f7cdf78788	Update llama.cpp	2024-02-13 12:24:00 -05:00
Andrei Betlen	68fb71b6a2	fix: missing generation_prompt in chatml-function-calling	2024-02-13 03:24:41 -05:00
Andrei Betlen	4b0e3320bd	fix: minor formatting bugs for chatml-function-calling	2024-02-13 03:11:35 -05:00
Andrei Betlen	6fe8b427e1	Bump version	2024-02-13 02:46:52 -05:00
Andrei Betlen	d1822fed6b	fix: Don't change order of json schema object properties unless prop_order is passed, Closes #1180	2024-02-13 02:44:00 -05:00
Andrei Betlen	5efc45bdfd	Update llama.cpp	2024-02-13 02:43:07 -05:00
Andrei Betlen	4348a6cdf0	docs: Fix typo	2024-02-13 02:04:54 -05:00
Andrei Betlen	d605875772	Bump version	2024-02-12 16:28:30 -05:00
Andrei Betlen	b82b0e1014	docs: Temporarily revert function calling docs	2024-02-12 16:27:43 -05:00
Andrei Betlen	cb791716b4	fix: Always set logits_all = True when using speculative decoding	2024-02-12 16:19:05 -05:00
Andrei	153a0049d9	feat: Generic chatml Function Calling (#957 ) * Add demo notebook * Add initial chat handler * Update OpenAI types * Add generic chatml function calling (wip) * Update chatml generic function calling. * Progress on auto-tool calls * fix streaming functions * Remove print statements * fix: Suppress output from llama.cpp init and grammar creation * Add OpenAI v1 python api compatible chat completion function * Support non-streaming multi-tool calls * Format * Include function_call in response.	2024-02-12 15:56:07 -05:00
Andrei Betlen	69413ce08e	Update llama.cpp	2024-02-11 19:00:17 -05:00
Andrei Betlen	9368670639	Update llama.cpp	2024-02-11 14:02:46 -05:00
Connor	a05d90446f	fix: Circular dependancy preventing early Llama object free (#1176 ) commit `901827013b` introduced a cyclic dependency within Llama objects. That change causes old models to linger in memory longer than necessary, thereby creating memory bloat in most applications attempting to switch between models at runtime. This patch simply removes the problematic line, allowing models to deallocate without relying on GC. One might also consider combining `weakref.ref` with a `@property` if the `llama` attribute is absolutely necessary to expose in the tokenizer class.	2024-02-11 13:57:57 -05:00
Akarshan Biswas	918ff27e50	docs: Set the correct command for compiling with syscl support (#1172 )	2024-02-11 13:55:15 -05:00
Douglas Hanley	19b55ad3e5	feat: use gpu backend for clip if available (#1175 )	2024-02-11 13:53:59 -05:00
Andrei Betlen	63b0c37836	Update llama.cpp	2024-02-09 13:36:58 -05:00
Andrei Betlen	4abb8c9386	Merge branch 'main' of github.com:abetlen/llama_cpp_python into main	2024-02-09 13:32:31 -05:00
Andrei Betlen	e16f06e6eb	fix: revert _create_completions.	2024-02-09 02:02:13 -05:00
Andrei Betlen	dfc1b17341	Update llama.cpp	2024-02-08 23:38:12 -05:00
Andrei Betlen	5b4ad6c80b	Merge branch 'main' of https://github.com/abetlen/llama-cpp-python into main	2024-02-08 23:34:45 -05:00
Andrei Betlen	85d3374b4d	fix: broken import	2024-02-08 01:13:28 -05:00
Andrei Betlen	b5fca911b5	feat: Move tokenizer to own module	2024-02-08 01:08:18 -05:00
Andrei Betlen	2ef7ba3aed	misc: rename grammar test	2024-02-08 01:07:44 -05:00
Jeffrey Fong	901827013b	feat: Integrate functionary v1.4 and v2 models + add custom tokenizer support to Llama class (#1078 ) * convert functionary-v1 chat handler to use hf autotokenizer * add hf_tokenizer + inteegrate functionary-v1.4 prompt template * integrate functionary v2 prompt template * update readme * set up parallel function calling wip * set up parallel function calling * Update README.md * Update README.md * refactor tokenizers * include old functionary handler for backward compatibility * add hf_tokenizer_path in server ModelSettings * convert functionary-v1 chat handler to use hf autotokenizer * add hf_tokenizer + inteegrate functionary-v1.4 prompt template * integrate functionary v2 prompt template * update readme * set up parallel function calling wip * resolve merge conflict * Update README.md * Update README.md * refactor tokenizers * include old functionary handler for backward compatibility * add hf_tokenizer_path in server ModelSettings * Cleanup PR, fix breaking changes * Use hf_pretrained_model_name_or_path for tokenizer * fix hf tokenizer in streaming * update README * refactor offset mapping --------- Co-authored-by: Andrei <abetlen@gmail.com>	2024-02-07 20:07:03 -05:00
Andrei Betlen	ce12775490	Update llama.cpp	2024-02-06 18:50:56 -05:00
Andrei Betlen	34f31040f6	Bump version	2024-02-06 12:47:59 -05:00
Andrei Betlen	5e3e67af47	Update llama.cpp	2024-02-06 12:44:07 -05:00
Andrei Betlen	310fbf4e49	Update llama.cpp	2024-02-05 22:07:14 -05:00
Andrei Betlen	59760c85ed	fix: Use llama_log_callback to avoid suppress_stdout_stderr	2024-02-05 21:52:12 -05:00
Andrei Betlen	3553b14670	Update llama.cpp	2024-02-05 13:26:50 -05:00
Andrei	7467f129e5	Revert "Fix: fileno error google colab (#729 ) (#1156 )" (#1157 ) This reverts commit `bebfba0f08`.	2024-02-02 12:18:55 -05:00
Dulsara	bebfba0f08	Fix: fileno error google colab (#729 ) (#1156 ) Instead of using a devnull just made a dummy class with a 'write()' method that does nothing.	2024-02-02 12:05:46 -05:00
Andrei Betlen	8a5911bd5d	Update llama.cpp	2024-02-02 09:41:27 -05:00
Andrei Betlen	de526d0214	Update llama.cpp	2024-02-01 12:35:31 -05:00
Andrei Betlen	3322eadbf3	Bump version	2024-01-31 15:10:18 -05:00
Andrei Betlen	a8cb34eacd	Update llama.cpp	2024-01-31 15:05:51 -05:00
Andrei	fb762a6041	Add speculative decoding (#1120 ) * Add draft model param to llama class, implement basic prompt lookup decoding draft model * Use samplingcontext for sampling * Use 1d array * Use draft model for sampling * Fix dumb mistake * Allow for later extensions to the LlamaDraftModel api * Cleanup * Adaptive candidate prediction * Update implementation to match hf transformers * Tuning * Fix bug where last token was not used for ngram prediction * Remove heuristic for num_pred_tokens (no benefit) * fix: n_candidates bug. * Add draft_model_num_pred_tokens server setting * Cleanup * Update README	2024-01-31 14:08:14 -05:00
Andrei Betlen	71e3e4c435	Update llama.cpp	2024-01-31 10:41:42 -05:00
Andrei Betlen	2b37d8e438	fix: Run server command. Closes #1143	2024-01-31 10:37:19 -05:00
Andrei Betlen	078cca0361	fix: Pass raise_exception and add_generation_prompt to jinja2 chat template	2024-01-31 08:42:21 -05:00
Andrei Betlen	411494706a	Update llama.cpp	2024-01-31 08:35:21 -05:00
Andrei Betlen	bf9e824922	Bump version	2024-01-30 12:27:27 -05:00
Andrei Betlen	247a16de66	docs: Update README	2024-01-30 12:23:07 -05:00
Andrei Betlen	13b7ced7da	Update llama.cpp	2024-01-30 12:21:41 -05:00
Andrei Betlen	011cd84ded	Update llama.cpp	2024-01-30 09:48:09 -05:00
Andrei	da003d8768	Automatically set chat format from gguf (#1110 ) * Use jinja formatter to load chat format from gguf * Fix off-by-one error in metadata loader * Implement chat format auto-detection	2024-01-29 14:22:23 -05:00
Andrei Betlen	059f6b3ac8	docs: fix typos	2024-01-29 11:02:25 -05:00

1 2 3 4 5 ...

1518 commits