baalajimaestro/llama.cpp

Author	SHA1	Message	Date
Andrei Betlen	bb610b9428	Update llama.cpp	2024-01-11 22:51:12 -05:00
Andrei Betlen	f0159663d9	Bump version	2024-01-10 02:51:17 -05:00
Stephen Hankinson	df3be58d6c	Add ability to pass in penalize_nl param (#1068 )	2024-01-10 02:46:27 -05:00
Joseph Turian	2ddce7294e	print_grammar to stderr (#1052 )	2024-01-10 02:46:03 -05:00
Andrei Betlen	1ae05c102b	Update llama.cpp	2024-01-08 14:51:29 -05:00
Andrei Betlen	75d0527fd7	Bump version	2024-01-04 18:30:12 -05:00
Fedor Moiseev	907b9e9d42	Add Saiga chat format. (#1050 )	2024-01-04 18:12:58 -05:00
xaviviro	cf743ec5d3	Added ChatGLM chat format (#1059 ) Co-authored-by: Xavier Vinaixa Rosello <xaviviro@MacBook-Pro-de-Xavier.local>	2024-01-04 18:12:02 -05:00
Andrei Betlen	eb9c7d4ed8	Update llama.cpp	2024-01-03 22:04:04 -05:00
Andrei Betlen	011c3630f5	Bump version	2023-12-27 17:35:02 -05:00
Andrei Betlen	92284f32cb	Add HIP_PATH to dll search directories for windows users.	2023-12-22 15:29:56 -05:00
Andrei Betlen	2b0d3f36fa	set llama_max_devices using library function	2023-12-22 15:19:28 -05:00
Andrei Betlen	d9a1d90fd7	Fix typo	2023-12-22 15:12:27 -05:00
Andrei Betlen	37556bf9c4	Bump version	2023-12-22 14:55:58 -05:00
Andrei Betlen	6d8bc090f9	fix: inccorect bindings for kv override. Based on #1011	2023-12-22 14:52:20 -05:00
Andrei Betlen	522aecb868	docs: add server config docs	2023-12-22 14:37:24 -05:00
Andrei Betlen	6473796343	Update llama.cpp	2023-12-22 14:10:34 -05:00
swg	4b01a873ef	server: Support none defaulting to infinity for completions (#111 ) * Support defaulting to infinity or -1 for chat completions * Check if completion_tokens is none in error handler. * fix: max_tokens in create completion should match openai spec * Fix __call__ --------- Co-authored-by: Andrei Betlen <abetlen@gmail.com>	2023-12-22 14:05:13 -05:00
Dave	12b7f2f4e9	[Feat] Multi model support (#931 ) * Update Llama class to handle chat_format & caching * Add settings.py * Add util.py & update __main__.py * multimodel * update settings.py * cleanup * delete util.py * Fix /v1/models endpoint * MultiLlama now iterable, app check-alive on "/" * instant model init if file is given * backward compability * revert model param mandatory * fix error * handle individual model config json * refactor * revert chathandler/clip_model changes * handle chat_handler in MulitLlama() * split settings into server/llama * reduce global vars * Update LlamaProxy to handle config files * Add free method to LlamaProxy * update arg parsers & install server alias * refactor cache settings * change server executable name * better var name * whitespace * Revert "whitespace" This reverts commit bc5cf51c64a95bfc9926e1bc58166059711a1cd8. * remove exe_name * Fix merge bugs * Fix type annotations * Fix type annotations * Fix uvicorn app factory * Fix settings * Refactor server * Remove formatting fix * Format * Use default model if not found in model settings * Fix * Cleanup * Fix * Fix * Remove unnused CommandLineSettings * Cleanup * Support default name for copilot-codex models --------- Co-authored-by: Andrei Betlen <abetlen@gmail.com>	2023-12-22 05:51:25 -05:00
Andrei Betlen	4a85442c35	Update llama.cpp	2023-12-22 00:12:37 -05:00
twaka	2f03fb0231	fix text_offset of multi-token characters (#1037 ) * fix text_offsets for bytes tokens * fix	2023-12-22 00:03:29 -05:00
docmeth02	33cc623346	Implement openai api compatible authentication (#1010 )	2023-12-21 13:44:49 -05:00
Andrei Betlen	a05b4da80a	fix: float32 is not JSON serializable when streaming logits.	2023-12-18 18:40:36 -05:00
Andrei Betlen	7df6c32544	Fix type annotations	2023-12-18 18:14:53 -05:00
Andrei Betlen	b703aad79e	Fix type annotation	2023-12-18 18:13:37 -05:00
Andrei Betlen	d0aedfcff6	Fix type annotation	2023-12-18 18:12:49 -05:00
Eduard Christian Dumitrescu	2993936b10	Fix ctypes definitions of `llama_kv_cache_view_update` and `llama_kv_cache_view_free`. (#1028 )	2023-12-18 18:11:26 -05:00
Andrei Betlen	5e863d8a3b	Bump version	2023-12-18 16:09:18 -05:00
Andrei Betlen	095c650006	Add offload_kqv option to llama and server	2023-12-18 15:36:09 -05:00
Andrei Betlen	472b344ae3	Remove unnused import	2023-12-18 15:32:40 -05:00
kddubey	6b2e0e05b4	perf: Don't convert logprobs arrays to lists (#1021 )	2023-12-18 14:28:12 -05:00
Brandon Roberts	62944df142	Bugfix: Remove f16_kv, add offload_kqv field (#1019 ) F16_KV appears to have been removed here: `af99c6fbfc` This addresses two issues: - #995 which just requests to add the KV cache offloading param - #1006 a NULL ptr exception when using the embeddings (introduced by leaving f16_kv in the fields struct)	2023-12-18 14:27:11 -05:00
Daniele Morotti	f1c631dc53	Bug fixed with n_ctx=0 (#1015 ) If the n_ctx is set to 0 the code should use the maximum context length of the selected model, but it didn't work. There was a problem with the initialization of this parameter and a related problem with 'n_batch'.	2023-12-16 18:59:50 -05:00
kddubey	5a8944672f	Fix logits_to_logprobs for 2-D and 3-D logits (#1002 ) * Fix logits_to_logprobs for 2-D and 3-D logits * Set dtype to single * Test size	2023-12-16 18:59:26 -05:00
Andrei Betlen	534b1ea9b5	Update llama.cpp	2023-12-16 18:57:43 -05:00
Andrei Betlen	cbce061ffd	Bump version	2023-12-13 21:52:29 -05:00
yhfgyyf	8b4db732bd	Add qwen chat format (#1005 )	2023-12-13 21:43:43 -05:00
Andrei Betlen	690c563b60	Merge branch 'main' of github.com:abetlen/llama_cpp_python into main	2023-12-13 21:43:19 -05:00
Andrei Betlen	c0fc0a1e82	Update llama.cpp	2023-12-13 21:43:16 -05:00
Radoslav Gerganov	8e44a32075	Add support for running the server with SSL (#994 )	2023-12-11 20:47:11 -05:00
Tanner Hobson	ef22e478db	Replace logits_to_logprobs implementation with numpy equivalent to llama.cpp (#991 ) See #990. This change makes the logits_to_logprobs function equivalent to the version in the llama.cpp repository. It uses numpy so it's much faster than the previous version.	2023-12-11 20:46:27 -05:00
zocainViken	ac35f68e4d	Fix UnsupportedOperation: fileno in suppress_stdout_stderr (#961 ) * bug fixing * llava from readme got this error: UnsupportedOperation: fileno quick fix by checking hasattr * multi modal params fix: add logits = True -> to make llava work * multi modal params fix: add logits = True -> to make llava work --------- Co-authored-by: Andrei <abetlen@gmail.com>	2023-12-11 20:44:51 -05:00
chiensen	b938cccf05	Add Pygmalion chat format (#986 )	2023-12-11 20:44:04 -05:00
Andrei Betlen	c1e73e73a3	Bump version	2023-12-11 10:26:42 -05:00
Andrei Betlen	ec26f364cc	Remove f16_kv	2023-12-11 10:25:37 -05:00
Andrei Betlen	f1edc66b21	Update llama.cpp	2023-12-11 10:21:35 -05:00
kddubey	b069d06346	Fix #891 (#952 )	2023-11-29 05:39:52 -05:00
Andrei Betlen	ad963a0961	Bump version	2023-11-28 04:58:20 -05:00
Andrei Betlen	e3941d9c67	Make building llava optional	2023-11-28 04:55:21 -05:00
Andrei Betlen	7f3704b896	Bump version	2023-11-27 19:14:25 -05:00
Andrei Betlen	396dbf0b2b	docs: Improve low-level docstrings	2023-11-27 19:03:02 -05:00
Andrei Betlen	a928893d03	Merge branch 'main' of github.com:abetlen/llama_cpp_python into main	2023-11-26 15:57:13 -05:00
Andrei Betlen	6308f21d5e	docs: Update Llama docs	2023-11-26 15:56:40 -05:00
Gardner Bickford	c2d63a7148	fix: Typo in the Open Orca chat format #874 (#947 )	2023-11-26 15:39:18 -05:00
Andrei Betlen	f03a38e62a	Update llama.cpp	2023-11-26 15:38:22 -05:00
Andrei Betlen	1a7bf2037b	docs: Update openapi endpoint names	2023-11-24 03:39:29 -05:00
Andrei Betlen	4026166e68	docs: Update completion and chat_completion parameter docstrings	2023-11-24 03:24:19 -05:00
Andrei Betlen	8c3aa7858b	Merge branch 'main' of github.com:abetlen/llama_cpp_python into main	2023-11-24 00:15:36 -05:00
Andrei Betlen	de2e2bc083	misc fix verbose printing in functionary model	2023-11-23 20:14:23 -05:00
Andrei Betlen	36048d46af	Update llama.cpp	2023-11-23 16:26:00 -05:00
mrfakename	d68fc07b1b	Add Zephyr format (#937 )	2023-11-23 01:20:08 -05:00
caiyesd	4184835078	Add chat format to support baichuan (#938 ) Signed-off-by: caiyesd <caiyesd@gmail.com>	2023-11-23 01:19:50 -05:00
Andrei Betlen	c647f01609	Add from_json_schema to LlamaGrammar	2023-11-23 00:27:00 -05:00
Andrei Betlen	be1f64d569	docs: Add docstrings from llama.cpp	2023-11-23 00:26:26 -05:00
Andrei Betlen	b6bb7ac76a	docs: Add Llama class example	2023-11-22 23:10:04 -05:00
caiyesd	b8f29f4bf0	Add baichuan-2 chat format (#936 ) Signed-off-by: caiyesd <caiyesd@gmail.com>	2023-11-22 06:08:06 -05:00
Andrei Betlen	8b6ca22846	Fix type warnings for json schema grammar converter	2023-11-21 13:32:00 -05:00
Andrei Betlen	230fc8b535	Bump version	2023-11-21 05:04:55 -05:00
Andrei Betlen	128dc4731f	Fix #569	2023-11-21 04:39:05 -05:00
Andrei Betlen	7a3f87846b	Format	2023-11-21 04:02:20 -05:00
Andrei Betlen	422ebc89ce	Fix: Add logit_bias to all completion api methods	2023-11-21 04:01:36 -05:00
Andrei Betlen	07e47f55ba	Add support for logit_bias outside of server api. Closes #827	2023-11-21 03:59:46 -05:00
Maarten ter Huurne	c21edb6908	Do not set `grammar` to `None` for new `LlamaGrammar` objects (#834 ) * Do not set `grammar` to `None` for new `LlamaGrammar` objects The `grammar` attribute is written by `init()`, but that method always returns `None`, so `__init__()` would then discard the previously written object. * Add minimal test for grammar parsing	2023-11-21 00:23:18 -05:00
mrfakename	ef65fc5ff4	Add MistralLite, Intel, and OpenChat prompt formats (#927 ) * Add MistralLite format * Update llama_chat_format.py * Update llama_chat_format.py	2023-11-21 00:19:25 -05:00
TK-Master	b8438f70b5	Added support for min_p (#921 ) * Added support for min_p My small contribution to this great project. Ref: https://github.com/ggerganov/llama.cpp/pull/3841 Closes: https://github.com/abetlen/llama-cpp-python/issues/911 * Fix for negative temp (sample_softmax)	2023-11-20 23:21:33 -05:00
Andrei Betlen	a34d480141	Fix #929	2023-11-20 22:50:59 -05:00
Andrei Betlen	2c2afa320f	Update llama.cpp	2023-11-20 14:11:33 -05:00
Andrei Betlen	f2901d840e	Bump version	2023-11-14 14:10:00 -05:00
Andrei Betlen	01846a76b9	Bump version	2023-11-10 16:36:12 -05:00
Andrei Betlen	b7e60b66f4	Bump version	2023-11-10 06:21:24 -05:00
Andrei Betlen	6f0b0b1b84	Fix sampling bug when logits_all=False	2023-11-10 05:15:41 -05:00
Andrei Betlen	d9b38e3e3a	Potential bugfix for eval	2023-11-10 04:41:19 -05:00
Andrei Betlen	b84d76a844	Fix: add default stop sequence to chatml chat format	2023-11-10 04:24:48 -05:00
Andrei Betlen	1b376c62b7	Update functionary for new OpenAI API	2023-11-10 02:51:58 -05:00
Andrei Betlen	17da8fb446	Add missing tool_calls finish_reason	2023-11-10 02:51:06 -05:00
Andrei Betlen	770df34436	Add $ref and $defs support to json schema converter	2023-11-10 02:50:46 -05:00
Andrei Betlen	faeae181b1	Fix: json_schema_to_gbnf should take string dump of json schema as input	2023-11-10 02:50:17 -05:00
Andrei Betlen	e7962d2c73	Fix: default max_tokens matches openai api (16 for completion, max length for chat completion)	2023-11-10 02:49:27 -05:00
Andrei Betlen	b62c449839	Bugfix: missing response_format for functionary and llava chat handlers	2023-11-09 00:55:23 -05:00
Andrei Betlen	fd41ed3a90	Add set_seed to Llama class	2023-11-08 11:09:41 -05:00
Andrei Betlen	ca4cb88351	Fix destructor NoneType is not callable error	2023-11-08 11:05:45 -05:00
Andrei Betlen	01cb3a0381	Bump version	2023-11-08 00:54:54 -05:00
Andrei Betlen	b30b9c338b	Add JSON mode support. Closes #881	2023-11-08 00:07:16 -05:00
Andrei Betlen	4852a6a39c	Fix built in GBNF grammar rules	2023-11-08 00:06:22 -05:00
Andrei Betlen	64f5153c35	Add seed parameter to chat handlers	2023-11-07 23:41:29 -05:00
Andrei Betlen	86aeb9f3a1	Add seed parameter support for completion and chat_completion requests. Closes #884	2023-11-07 23:37:28 -05:00
Damian Stewart	aab74f0b2b	Multimodal Support (Llava 1.5) (#821 ) * llava v1.5 integration * Point llama.cpp to fork * Add llava shared library target * Fix type * Update llama.cpp * Add llava api * Revert changes to llama and llama_cpp * Update llava example * Add types for new gpt-4-vision-preview api * Fix typo * Update llama.cpp * Update llama_types to match OpenAI v1 API * Update ChatCompletionFunction type * Reorder request parameters * More API type fixes * Even More Type Updates * Add parameter for custom chat_handler to Llama class * Fix circular import * Convert to absolute imports * Fix * Fix pydantic Jsontype bug * Accept list of prompt tokens in create_completion * Add llava1.5 chat handler * Add Multimodal notebook * Clean up examples * Add server docs --------- Co-authored-by: Andrei Betlen <abetlen@gmail.com>	2023-11-07 22:48:51 -05:00
Andrei Betlen	56171cf7bf	Bump version	2023-11-06 09:37:55 -05:00
Andrei Betlen	be0add1b2d	Fix type bug	2023-11-06 09:30:38 -05:00
Andrei Betlen	e214a58422	Refactor Llama class internals	2023-11-06 09:16:36 -05:00

1 2 3 4 5 ...

611 commits