baalajimaestro/llama.cpp

Author	SHA1	Message	Date
Andrei Betlen	359ae73643	Update llama.cpp	2024-01-14 08:17:22 -05:00
Andrei Betlen	7c898d5684	Update llama.cpp	2024-01-13 22:37:49 -05:00
Andrei Betlen	bb610b9428	Update llama.cpp	2024-01-11 22:51:12 -05:00
Andrei Betlen	f0159663d9	Bump version	2024-01-10 02:51:17 -05:00
Stephen Hankinson	df3be58d6c	Add ability to pass in penalize_nl param (#1068 )	2024-01-10 02:46:27 -05:00
Joseph Turian	2ddce7294e	print_grammar to stderr (#1052 )	2024-01-10 02:46:03 -05:00
Andrei Betlen	1ae05c102b	Update llama.cpp	2024-01-08 14:51:29 -05:00
Andrei Betlen	75d0527fd7	Bump version	2024-01-04 18:30:12 -05:00
Fedor Moiseev	907b9e9d42	Add Saiga chat format. (#1050 )	2024-01-04 18:12:58 -05:00
xaviviro	cf743ec5d3	Added ChatGLM chat format (#1059 ) Co-authored-by: Xavier Vinaixa Rosello <xaviviro@MacBook-Pro-de-Xavier.local>	2024-01-04 18:12:02 -05:00
Andrei Betlen	eb9c7d4ed8	Update llama.cpp	2024-01-03 22:04:04 -05:00
Andrei Betlen	011c3630f5	Bump version	2023-12-27 17:35:02 -05:00
Andrei Betlen	92284f32cb	Add HIP_PATH to dll search directories for windows users.	2023-12-22 15:29:56 -05:00
Andrei Betlen	2b0d3f36fa	set llama_max_devices using library function	2023-12-22 15:19:28 -05:00
Andrei Betlen	d9a1d90fd7	Fix typo	2023-12-22 15:12:27 -05:00
Andrei Betlen	37556bf9c4	Bump version	2023-12-22 14:55:58 -05:00
Andrei Betlen	6d8bc090f9	fix: inccorect bindings for kv override. Based on #1011	2023-12-22 14:52:20 -05:00
Andrei Betlen	522aecb868	docs: add server config docs	2023-12-22 14:37:24 -05:00
Andrei Betlen	6473796343	Update llama.cpp	2023-12-22 14:10:34 -05:00
swg	4b01a873ef	server: Support none defaulting to infinity for completions (#111 ) * Support defaulting to infinity or -1 for chat completions * Check if completion_tokens is none in error handler. * fix: max_tokens in create completion should match openai spec * Fix __call__ --------- Co-authored-by: Andrei Betlen <abetlen@gmail.com>	2023-12-22 14:05:13 -05:00
Dave	12b7f2f4e9	[Feat] Multi model support (#931 ) * Update Llama class to handle chat_format & caching * Add settings.py * Add util.py & update __main__.py * multimodel * update settings.py * cleanup * delete util.py * Fix /v1/models endpoint * MultiLlama now iterable, app check-alive on "/" * instant model init if file is given * backward compability * revert model param mandatory * fix error * handle individual model config json * refactor * revert chathandler/clip_model changes * handle chat_handler in MulitLlama() * split settings into server/llama * reduce global vars * Update LlamaProxy to handle config files * Add free method to LlamaProxy * update arg parsers & install server alias * refactor cache settings * change server executable name * better var name * whitespace * Revert "whitespace" This reverts commit bc5cf51c64a95bfc9926e1bc58166059711a1cd8. * remove exe_name * Fix merge bugs * Fix type annotations * Fix type annotations * Fix uvicorn app factory * Fix settings * Refactor server * Remove formatting fix * Format * Use default model if not found in model settings * Fix * Cleanup * Fix * Fix * Remove unnused CommandLineSettings * Cleanup * Support default name for copilot-codex models --------- Co-authored-by: Andrei Betlen <abetlen@gmail.com>	2023-12-22 05:51:25 -05:00
Andrei Betlen	4a85442c35	Update llama.cpp	2023-12-22 00:12:37 -05:00
twaka	2f03fb0231	fix text_offset of multi-token characters (#1037 ) * fix text_offsets for bytes tokens * fix	2023-12-22 00:03:29 -05:00
docmeth02	33cc623346	Implement openai api compatible authentication (#1010 )	2023-12-21 13:44:49 -05:00
Andrei Betlen	a05b4da80a	fix: float32 is not JSON serializable when streaming logits.	2023-12-18 18:40:36 -05:00
Andrei Betlen	7df6c32544	Fix type annotations	2023-12-18 18:14:53 -05:00
Andrei Betlen	b703aad79e	Fix type annotation	2023-12-18 18:13:37 -05:00
Andrei Betlen	d0aedfcff6	Fix type annotation	2023-12-18 18:12:49 -05:00
Eduard Christian Dumitrescu	2993936b10	Fix ctypes definitions of `llama_kv_cache_view_update` and `llama_kv_cache_view_free`. (#1028 )	2023-12-18 18:11:26 -05:00
Andrei Betlen	5e863d8a3b	Bump version	2023-12-18 16:09:18 -05:00
Andrei Betlen	095c650006	Add offload_kqv option to llama and server	2023-12-18 15:36:09 -05:00
Andrei Betlen	472b344ae3	Remove unnused import	2023-12-18 15:32:40 -05:00
kddubey	6b2e0e05b4	perf: Don't convert logprobs arrays to lists (#1021 )	2023-12-18 14:28:12 -05:00
Brandon Roberts	62944df142	Bugfix: Remove f16_kv, add offload_kqv field (#1019 ) F16_KV appears to have been removed here: `af99c6fbfc` This addresses two issues: - #995 which just requests to add the KV cache offloading param - #1006 a NULL ptr exception when using the embeddings (introduced by leaving f16_kv in the fields struct)	2023-12-18 14:27:11 -05:00
Daniele Morotti	f1c631dc53	Bug fixed with n_ctx=0 (#1015 ) If the n_ctx is set to 0 the code should use the maximum context length of the selected model, but it didn't work. There was a problem with the initialization of this parameter and a related problem with 'n_batch'.	2023-12-16 18:59:50 -05:00
kddubey	5a8944672f	Fix logits_to_logprobs for 2-D and 3-D logits (#1002 ) * Fix logits_to_logprobs for 2-D and 3-D logits * Set dtype to single * Test size	2023-12-16 18:59:26 -05:00
Andrei Betlen	534b1ea9b5	Update llama.cpp	2023-12-16 18:57:43 -05:00
Andrei Betlen	cbce061ffd	Bump version	2023-12-13 21:52:29 -05:00
yhfgyyf	8b4db732bd	Add qwen chat format (#1005 )	2023-12-13 21:43:43 -05:00
Andrei Betlen	690c563b60	Merge branch 'main' of github.com:abetlen/llama_cpp_python into main	2023-12-13 21:43:19 -05:00
Andrei Betlen	c0fc0a1e82	Update llama.cpp	2023-12-13 21:43:16 -05:00
Radoslav Gerganov	8e44a32075	Add support for running the server with SSL (#994 )	2023-12-11 20:47:11 -05:00
Tanner Hobson	ef22e478db	Replace logits_to_logprobs implementation with numpy equivalent to llama.cpp (#991 ) See #990. This change makes the logits_to_logprobs function equivalent to the version in the llama.cpp repository. It uses numpy so it's much faster than the previous version.	2023-12-11 20:46:27 -05:00
zocainViken	ac35f68e4d	Fix UnsupportedOperation: fileno in suppress_stdout_stderr (#961 ) * bug fixing * llava from readme got this error: UnsupportedOperation: fileno quick fix by checking hasattr * multi modal params fix: add logits = True -> to make llava work * multi modal params fix: add logits = True -> to make llava work --------- Co-authored-by: Andrei <abetlen@gmail.com>	2023-12-11 20:44:51 -05:00
chiensen	b938cccf05	Add Pygmalion chat format (#986 )	2023-12-11 20:44:04 -05:00
Andrei Betlen	c1e73e73a3	Bump version	2023-12-11 10:26:42 -05:00
Andrei Betlen	ec26f364cc	Remove f16_kv	2023-12-11 10:25:37 -05:00
Andrei Betlen	f1edc66b21	Update llama.cpp	2023-12-11 10:21:35 -05:00
kddubey	b069d06346	Fix #891 (#952 )	2023-11-29 05:39:52 -05:00
Andrei Betlen	ad963a0961	Bump version	2023-11-28 04:58:20 -05:00
Andrei Betlen	e3941d9c67	Make building llava optional	2023-11-28 04:55:21 -05:00
Andrei Betlen	7f3704b896	Bump version	2023-11-27 19:14:25 -05:00
Andrei Betlen	396dbf0b2b	docs: Improve low-level docstrings	2023-11-27 19:03:02 -05:00
Andrei Betlen	a928893d03	Merge branch 'main' of github.com:abetlen/llama_cpp_python into main	2023-11-26 15:57:13 -05:00
Andrei Betlen	6308f21d5e	docs: Update Llama docs	2023-11-26 15:56:40 -05:00
Gardner Bickford	c2d63a7148	fix: Typo in the Open Orca chat format #874 (#947 )	2023-11-26 15:39:18 -05:00
Andrei Betlen	f03a38e62a	Update llama.cpp	2023-11-26 15:38:22 -05:00
Andrei Betlen	1a7bf2037b	docs: Update openapi endpoint names	2023-11-24 03:39:29 -05:00
Andrei Betlen	4026166e68	docs: Update completion and chat_completion parameter docstrings	2023-11-24 03:24:19 -05:00
Andrei Betlen	8c3aa7858b	Merge branch 'main' of github.com:abetlen/llama_cpp_python into main	2023-11-24 00:15:36 -05:00
Andrei Betlen	de2e2bc083	misc fix verbose printing in functionary model	2023-11-23 20:14:23 -05:00
Andrei Betlen	36048d46af	Update llama.cpp	2023-11-23 16:26:00 -05:00
mrfakename	d68fc07b1b	Add Zephyr format (#937 )	2023-11-23 01:20:08 -05:00
caiyesd	4184835078	Add chat format to support baichuan (#938 ) Signed-off-by: caiyesd <caiyesd@gmail.com>	2023-11-23 01:19:50 -05:00
Andrei Betlen	c647f01609	Add from_json_schema to LlamaGrammar	2023-11-23 00:27:00 -05:00
Andrei Betlen	be1f64d569	docs: Add docstrings from llama.cpp	2023-11-23 00:26:26 -05:00
Andrei Betlen	b6bb7ac76a	docs: Add Llama class example	2023-11-22 23:10:04 -05:00
caiyesd	b8f29f4bf0	Add baichuan-2 chat format (#936 ) Signed-off-by: caiyesd <caiyesd@gmail.com>	2023-11-22 06:08:06 -05:00
Andrei Betlen	8b6ca22846	Fix type warnings for json schema grammar converter	2023-11-21 13:32:00 -05:00
Andrei Betlen	230fc8b535	Bump version	2023-11-21 05:04:55 -05:00
Andrei Betlen	128dc4731f	Fix #569	2023-11-21 04:39:05 -05:00
Andrei Betlen	7a3f87846b	Format	2023-11-21 04:02:20 -05:00
Andrei Betlen	422ebc89ce	Fix: Add logit_bias to all completion api methods	2023-11-21 04:01:36 -05:00
Andrei Betlen	07e47f55ba	Add support for logit_bias outside of server api. Closes #827	2023-11-21 03:59:46 -05:00
Maarten ter Huurne	c21edb6908	Do not set `grammar` to `None` for new `LlamaGrammar` objects (#834 ) * Do not set `grammar` to `None` for new `LlamaGrammar` objects The `grammar` attribute is written by `init()`, but that method always returns `None`, so `__init__()` would then discard the previously written object. * Add minimal test for grammar parsing	2023-11-21 00:23:18 -05:00
mrfakename	ef65fc5ff4	Add MistralLite, Intel, and OpenChat prompt formats (#927 ) * Add MistralLite format * Update llama_chat_format.py * Update llama_chat_format.py	2023-11-21 00:19:25 -05:00
TK-Master	b8438f70b5	Added support for min_p (#921 ) * Added support for min_p My small contribution to this great project. Ref: https://github.com/ggerganov/llama.cpp/pull/3841 Closes: https://github.com/abetlen/llama-cpp-python/issues/911 * Fix for negative temp (sample_softmax)	2023-11-20 23:21:33 -05:00
Andrei Betlen	a34d480141	Fix #929	2023-11-20 22:50:59 -05:00
Andrei Betlen	2c2afa320f	Update llama.cpp	2023-11-20 14:11:33 -05:00
Andrei Betlen	f2901d840e	Bump version	2023-11-14 14:10:00 -05:00
Andrei Betlen	01846a76b9	Bump version	2023-11-10 16:36:12 -05:00
Andrei Betlen	b7e60b66f4	Bump version	2023-11-10 06:21:24 -05:00
Andrei Betlen	6f0b0b1b84	Fix sampling bug when logits_all=False	2023-11-10 05:15:41 -05:00
Andrei Betlen	d9b38e3e3a	Potential bugfix for eval	2023-11-10 04:41:19 -05:00
Andrei Betlen	b84d76a844	Fix: add default stop sequence to chatml chat format	2023-11-10 04:24:48 -05:00
Andrei Betlen	1b376c62b7	Update functionary for new OpenAI API	2023-11-10 02:51:58 -05:00
Andrei Betlen	17da8fb446	Add missing tool_calls finish_reason	2023-11-10 02:51:06 -05:00
Andrei Betlen	770df34436	Add $ref and $defs support to json schema converter	2023-11-10 02:50:46 -05:00
Andrei Betlen	faeae181b1	Fix: json_schema_to_gbnf should take string dump of json schema as input	2023-11-10 02:50:17 -05:00
Andrei Betlen	e7962d2c73	Fix: default max_tokens matches openai api (16 for completion, max length for chat completion)	2023-11-10 02:49:27 -05:00
Andrei Betlen	b62c449839	Bugfix: missing response_format for functionary and llava chat handlers	2023-11-09 00:55:23 -05:00
Andrei Betlen	fd41ed3a90	Add set_seed to Llama class	2023-11-08 11:09:41 -05:00
Andrei Betlen	ca4cb88351	Fix destructor NoneType is not callable error	2023-11-08 11:05:45 -05:00
Andrei Betlen	01cb3a0381	Bump version	2023-11-08 00:54:54 -05:00
Andrei Betlen	b30b9c338b	Add JSON mode support. Closes #881	2023-11-08 00:07:16 -05:00
Andrei Betlen	4852a6a39c	Fix built in GBNF grammar rules	2023-11-08 00:06:22 -05:00
Andrei Betlen	64f5153c35	Add seed parameter to chat handlers	2023-11-07 23:41:29 -05:00
Andrei Betlen	86aeb9f3a1	Add seed parameter support for completion and chat_completion requests. Closes #884	2023-11-07 23:37:28 -05:00
Damian Stewart	aab74f0b2b	Multimodal Support (Llava 1.5) (#821 ) * llava v1.5 integration * Point llama.cpp to fork * Add llava shared library target * Fix type * Update llama.cpp * Add llava api * Revert changes to llama and llama_cpp * Update llava example * Add types for new gpt-4-vision-preview api * Fix typo * Update llama.cpp * Update llama_types to match OpenAI v1 API * Update ChatCompletionFunction type * Reorder request parameters * More API type fixes * Even More Type Updates * Add parameter for custom chat_handler to Llama class * Fix circular import * Convert to absolute imports * Fix * Fix pydantic Jsontype bug * Accept list of prompt tokens in create_completion * Add llava1.5 chat handler * Add Multimodal notebook * Clean up examples * Add server docs --------- Co-authored-by: Andrei Betlen <abetlen@gmail.com>	2023-11-07 22:48:51 -05:00
Andrei Betlen	56171cf7bf	Bump version	2023-11-06 09:37:55 -05:00
Andrei Betlen	be0add1b2d	Fix type bug	2023-11-06 09:30:38 -05:00
Andrei Betlen	e214a58422	Refactor Llama class internals	2023-11-06 09:16:36 -05:00
Andrei Betlen	bbffdaebaa	Refactor autotokenizer format to reusable function	2023-11-06 09:07:27 -05:00
Joe	4ff8def4d0	#717 : Add support for Huggingface Autotokenizer (#790 ) Co-authored-by: Andrei <abetlen@gmail.com>	2023-11-05 18:06:36 -05:00
earonesty	3580e2c5df	Update llama_chat_format.py (#869 ) * Update llama_chat_format.py properly formal llama2 with first-message prompt embedded * Update llama_chat_format.py	2023-11-05 17:00:13 -05:00
Andrei Betlen	f0b30ef7dc	Update llama.cpp	2023-11-05 16:57:10 -05:00
Andrei Betlen	2ec043af76	Clean up stdout / stderr suppression	2023-11-03 13:02:15 -04:00
Andrei Betlen	4ea7027c41	Rename internal only module utils to _utils	2023-11-03 12:55:55 -04:00
Andrei Betlen	df9362eeea	Update llama.cpp	2023-11-03 11:34:50 -04:00
Andrei	3af7b21ff1	Add functionary support (#784 ) * Add common grammars and json-schema-to-grammar utility function from llama.cpp * Pass functions to format function * Add basic functionary formatting * Add LlamaChatHandler for more complex chat use cases * Add function calling example notebook * Add support for regular chat completions alongside function calling	2023-11-03 02:12:14 -04:00
Andrei	ab028cb878	Migrate inference to llama_batch and llama_decode api (#795 ) * Add low-level batching notebook * fix: tokenization of special characters: (#850) It should behave like llama.cpp, where most out of the box usages treat special characters accordingly * Update CHANGELOG * Cleanup * Fix runner label * Update notebook * Use llama_decode and batch api * Support logits_all parameter --------- Co-authored-by: Antoine Lizee <antoine.lizee@gmail.com>	2023-11-02 20:13:57 -04:00
Andrei Betlen	8350de9a18	Bump version	2023-11-02 15:53:01 -04:00
Andrei Betlen	011b95d7f3	Fix name 'open' is not defined exception. Closes #860	2023-11-02 15:30:55 -04:00
Andrei Betlen	fa83cc5f9c	Update llama.cpp Fix build examples Exclude examples directory Revert cmake changes Try actions/checkout@v4 Try to update submodules Revert Update llama.cpp Fix build examples Exclude examples directory Revert cmake changes Try actions/checkout@v4 Try to update submodules Revert	2023-11-02 14:28:15 -04:00
Antoine Lizee	4d4e0f11e2	fix: tokenization of special characters: (#850 ) It should behave like llama.cpp, where most out of the box usages treat special characters accordingly	2023-11-02 14:28:14 -04:00
Andrei Betlen	6b3aa7fc8f	Bump version	2023-11-01 19:25:03 -04:00
Sujeendran Menon	7b136bb5b1	Fix for shared library not found and compile issues in Windows (#848 ) * fix windows library dll name issue * Updated README.md Windows instructions * Update llama_cpp.py to handle different windows dll file versions	2023-11-01 18:55:57 -04:00
cebtenzzre	eefd76fe81	llama: fix exception in Llama.__del__ (#846 )	2023-11-01 18:53:57 -04:00
David Ponce	3fc9147218	Iterate over tokens that should be biased rather than the entire vocabulary. (#851 )	2023-11-01 18:53:47 -04:00
Marko Tasic	9c8f4dca5f	fixed Llama._create_completion suffix check, it can be either None or str instance (#854 )	2023-11-01 18:52:50 -04:00
Daniel Thuerck	5f8f369d1b	Pass-Through grammar parameter in web server. (#855 ) Closes #778	2023-11-01 18:51:12 -04:00
Adam Katora	25cb710281	Update llama_types.py (#849 ) Minor typo fix, funcion -> function	2023-11-01 18:50:11 -04:00
Andrei Betlen	d808fd436c	Update llama.cpp	2023-10-31 21:29:35 -04:00
Andrei Betlen	53861c9e53	Update llama.cpp	2023-10-24 03:13:32 -04:00
gmcgoldr	09a8406c83	Fix streaming doesn't return finish reason (#798 ) When streaming the yield that contains the finish can be skipped. This change ensures that yield isn't skipped.	2023-10-19 02:55:56 -04:00
Andrei Betlen	28c2b884e2	Merge branch 'main' of github.com:abetlen/llama_cpp_python into main	2023-10-19 02:55:31 -04:00
Andrei Betlen	ff580031d2	Update llama.cpp	2023-10-19 02:55:08 -04:00
Xiaoyu Kevin Hu	a315128d66	update value check for n_gpu_layers field (#826 )	2023-10-18 18:25:25 -04:00
Pierre Alexandre SCHEMBRI	10304d75fc	Make use of suppress_stdout_stderr when freeing model (#803 )	2023-10-15 13:52:43 -04:00
Ma, Guokai	a1ac199980	Fix repeat greeting (#808 ) * fix repeated greeting * remove seperator between role and message	2023-10-15 13:52:21 -04:00
Eric Liu	b50166500e	Add validation for tensor_split size exceeding LLAMA_MAX_DEVICES (#820 ) * Add validation for tensor_split size exceeding LLAMA_MAX_DEVICES * reword	2023-10-15 13:51:51 -04:00
Andrei Betlen	d6a130a052	Print traceback on server error	2023-10-10 15:56:04 -04:00
Andrei Betlen	43dfe1e2ab	Update llama.cpp	2023-10-05 16:07:49 -04:00
Andrei Betlen	a7d17b8ac9	Update llama.cpp	2023-10-03 15:23:35 -04:00
Andrei Betlen	305482bd41	Add chatml chat format	2023-09-30 21:01:34 -04:00
Andrei Betlen	5ef5280ef9	Log server exceptions to stdout	2023-09-30 19:13:36 -04:00
Andrei Betlen	fab4bccc35	Bump version	2023-09-30 16:04:46 -04:00
Andrei Betlen	d696251fbe	Fix logits_all bug	2023-09-30 16:02:35 -04:00
Andrei Betlen	6ee413d79e	Bump version	2023-09-30 13:23:09 -04:00
Andrei Betlen	42bb721d64	Fix bug in embedding	2023-09-30 13:20:22 -04:00
Andrei Betlen	5d62d55a82	Bump version	2023-09-30 00:07:06 -04:00
Andrei Betlen	386c88b68e	Bump version	2023-09-29 20:07:31 -04:00
Andrei Betlen	d9bce17794	Update server params	2023-09-29 19:59:12 -04:00
Andrei Betlen	3720c739d4	Update llama.cpp	2023-09-29 19:58:21 -04:00
Andrei	3bca7708fb	Configurable Chat Formats (#711 ) * Add configurable default chat completion format. * Remove chat_template file to avoid circular import * Update llama_types * Add chat format	2023-09-29 19:52:04 -04:00
Josh XT	a945404b4a	Fix rope scaling defaults (#767 ) * Fix rope scale with backwards compatibility * Fix defaults * Fix op * Remove backwards compatibility * Check single val	2023-09-29 16:03:57 -04:00
Andrei Betlen	1a1c3dc418	Update llama.cpp	2023-09-28 22:42:03 -04:00
Andrei Betlen	4177ae6d34	Bump version	2023-09-25 14:38:38 -04:00
Viacheslav/Slava Tradunsky	3d5e5b1c04	Adds openai-processing-ms response header (#748 )	2023-09-25 13:55:58 -04:00
Andrei Betlen	dbca136fea	Update llama_types and names to match openai api	2023-09-20 15:38:26 -04:00
Andrei Betlen	38e34c97f0	Update llama.cpp	2023-09-18 16:11:27 -04:00
Andrei Betlen	8d75016549	Install required runtime dlls to package directory on windows	2023-09-16 14:57:49 -04:00
Andrei Betlen	acf18fcdf0	Bump version	2023-09-15 14:22:21 -04:00
Andrei Betlen	b047b3034e	Remove confusing helpstring from server cli args. Closes #719	2023-09-15 14:09:43 -04:00
Andrei Betlen	24fec0b242	Bump version	2023-09-14 18:33:08 -04:00
Andrei Betlen	8474665625	Update base_path to fix issue resolving dll in windows isolation container.	2023-09-14 14:51:43 -04:00
Andrei Betlen	507bcc7171	Bump version	2023-09-13 23:15:23 -04:00
Andrei Betlen	0449d29b9f	Fix boolean env vars and cli arguments	2023-09-13 23:09:57 -04:00
earonesty	58a6e42cc0	Update app.py (#705 )	2023-09-13 23:01:34 -04:00
Andrei Betlen	f4090a0bb2	Add numa support, low level api users must now explicitly call llama_backend_init at the start of their programs.	2023-09-13 23:00:43 -04:00
Andrei Betlen	c999325e8e	Fix boolean cli flags	2023-09-13 22:56:10 -04:00
Andrei Betlen	4daf77e546	Format	2023-09-13 21:23:23 -04:00
Andrei Betlen	2920c4bf7e	Update server params. Added lora_base, lora_path, low_vram, and main_gpu. Removed rms_norm_eps and n_gqa (deprecated in llama.cpp)	2023-09-13 21:23:13 -04:00
Andrei Betlen	6a20293fc2	Reorder init params to match llama.cpp order	2023-09-13 21:20:26 -04:00
Andrei Betlen	c8f9b8a734	Explicitly make all init params other than model_path into keyword only params	2023-09-13 21:19:47 -04:00
Andrei Betlen	a68f9e2791	Add kwargs to init to catch extra params	2023-09-13 21:19:02 -04:00
Andrei Betlen	9e345a47a2	remove print	2023-09-13 21:12:27 -04:00
Andrei Betlen	517f9ed80b	Convert missed llama.cpp constants into standard python types	2023-09-13 21:11:52 -04:00
Andrei Betlen	c4c440ba2d	Fix tensor_split cli option	2023-09-13 20:00:42 -04:00
Andrei Betlen	203ede4ba2	Bump version	2023-09-13 18:07:08 -04:00
Andrei Betlen	759405c84b	Fix issue with Literal and Optional cli arguments not working. Closes #702	2023-09-13 18:06:12 -04:00
Devrim	da9df78db0	Add X-Request-ID request header for mirroring custom IDs. (#703 )	2023-09-13 16:18:31 -04:00
Andrei Betlen	8e13520796	Bump version	2023-09-13 01:47:58 -04:00
Andrei Betlen	2787663a25	Bump version	2023-09-12 21:00:01 -04:00
Andrei Betlen	6e89775759	Bump version	2023-09-12 18:57:01 -04:00
Andrei Betlen	bb4e67e7aa	Using dynamic version	2023-09-12 18:56:36 -04:00
Andrei Betlen	1910793f56	Merge branch 'main' into v0.2-wip	2023-09-12 16:43:32 -04:00
Andrei Betlen	c7901f1141	Bump version	2023-09-12 16:16:40 -04:00
janvdp	33ce931cce	merge upstream	2023-09-09 21:21:04 +02:00
Andrei Betlen	d3f63211ef	Update llama.cpp	2023-09-09 12:12:32 -04:00
janvdp	da0fdafc32	import version in __init__.py	2023-09-05 21:09:28 +02:00
janvdp	6e8e64d09a	add version file	2023-09-05 21:09:08 +02:00
Andrei Betlen	186626d58e	Update llama.cpp	2023-09-01 14:26:13 -04:00
Andrei Betlen	47de3ab104	Update llama.cpp	2023-08-29 07:36:20 -04:00
Andrei Betlen	3f76e1de52	cjk pr minor cleanup	2023-08-29 07:21:59 -04:00
Andrei	bae44ec8bf	Merge pull request #309 from MeouSker77/fix-CJK Fix CJK and emoji stream output	2023-08-29 06:58:10 -04:00
Andrei Betlen	e0dcbc28a1	Update llama.cpp	2023-08-28 10:33:45 -04:00
Andrei Betlen	4887973c22	Update llama.cpp	2023-08-27 12:59:20 -04:00
Andrei Betlen	3a29d65f45	Update llama.cpp	2023-08-26 23:36:24 -04:00
Andrei Betlen	5de8009706	Add copilot-codex completions endpoint for drop-in copilot usage	2023-08-25 17:49:14 -04:00
Andrei Betlen	ac47d55577	Merge branch 'main' into v0.2-wip	2023-08-25 15:45:22 -04:00
Andrei Betlen	ef23d1e545	Update llama.cpp	2023-08-25 14:35:53 -04:00
Andrei Betlen	48cf43b427	Use _with_model variants for tokenization	2023-08-25 13:43:16 -04:00
Andrei Betlen	8ac59465b9	Strip leading space when de-tokenizing.	2023-08-25 04:56:48 -04:00
Andrei Betlen	c2d1deaa8a	Update llama.cpp	2023-08-24 18:01:42 -04:00
Andrei Betlen	db982a861f	Fix	2023-08-24 01:01:12 -04:00
Andrei Betlen	4ed632c4b3	Remove deprecated params	2023-08-24 01:01:05 -04:00
Andrei Betlen	cf405f6764	Merge branch 'main' into v0.2-wip	2023-08-24 00:30:51 -04:00
Andrei Betlen	bbbf0f4fc4	Update llama.cpp	2023-08-24 00:17:00 -04:00
Andrei Betlen	e632c59fa0	Merge branch 'main' of github.com:abetlen/llama_cpp_python into main	2023-08-17 20:53:04 -04:00
c0sogi	a240aa6b25	Fix typos in llama_grammar	2023-08-17 21:00:44 +09:00
Andrei Betlen	620cd2fd69	Merge branch 'main' of github.com:abetlen/llama_cpp_python into main	2023-08-14 22:41:47 -04:00
Andrei Betlen	5788f1f2b2	Remove unnused import	2023-08-14 22:41:37 -04:00
Andrei	6dfb98117e	Merge pull request #600 from Vuizur/main Add py.typed to conform with PEP 561	2023-08-14 22:40:41 -04:00
Andrei	b99e758045	Merge pull request #604 from aliencaocao/main-1 Add doc string for n_gpu_layers argument and make -1 offload all layers	2023-08-14 22:40:10 -04:00
Andrei Betlen	b345d60987	Update llama.cpp	2023-08-14 22:33:30 -04:00
Billy Cao	c471871d0b	make n_gpu_layers=-1 offload all layers	2023-08-13 11:21:28 +08:00
Billy Cao	d018c7b01d	Add doc string for n_gpu_layers argument	2023-08-12 18:41:47 +08:00
Hannes Krumbiegel	17dd7fa8e0	Add py.typed	2023-08-11 09:58:48 +02:00
MeouSker77	88184ed217	fix CJK output again	2023-08-09 22:04:35 +08:00
Andrei Betlen	66fb0345e8	Move grammar to function call argument	2023-08-08 15:08:54 -04:00
Andrei Betlen	1e844d3238	fix	2023-08-08 15:07:28 -04:00
Andrei Betlen	843b7ccd90	Merge branch 'main' into c0sogi/main	2023-08-08 14:43:02 -04:00
Andrei Betlen	d015bdb4f8	Add mul_mat_q option	2023-08-08 14:35:06 -04:00
Andrei Betlen	f6a7850e1a	Update llama.cpp	2023-08-08 14:30:58 -04:00
c0sogi	0d7d2031a9	prevent memory access error by llama_grammar_free	2023-08-07 17:02:33 +09:00
c0sogi	b07713cb9f	reset grammar for every generation	2023-08-07 15:16:25 +09:00
c0sogi	418aa83b01	Added grammar based sampling	2023-08-07 02:21:37 +09:00
c0sogi	ac188a21f3	Added low level grammar API	2023-08-05 14:43:35 +09:00
Andrei Betlen	ce57920e60	Suppress llama.cpp output when loading model.	2023-07-28 14:45:18 -04:00
Andrei Betlen	a9b9f0397c	Format	2023-07-28 01:53:08 -04:00
Andrei Betlen	abc538fcd5	fix: annoying bug where attribute exceptions were droining out file not found exceptions	2023-07-28 01:43:00 -04:00
Shouyi Wang	426dbfe3f4	Change tensor_split from array to pointer	2023-07-25 18:29:59 +10:00
Andrei Betlen	078902a6fe	Add llama_grammar_accept_token	2023-07-24 15:55:26 -04:00
Andrei Betlen	bf901773b0	Add llama_sample_grammar	2023-07-24 15:42:31 -04:00
Andrei Betlen	1b6997d69f	Convert constants to python types and allow python types in low-level api	2023-07-24 15:42:07 -04:00
Andrei Betlen	343480364f	Merge branch 'main' into v0.2-wip	2023-07-24 15:26:08 -04:00
Andrei Betlen	11dd2bf382	Add temporary rms_norm_eps parameter	2023-07-24 14:09:24 -04:00
Andrei Betlen	8cd64d4ac3	Add rms_eps_norm	2023-07-24 13:52:12 -04:00
bretello	0f09f10e8c	add support for llama2 70b	2023-07-24 19:38:24 +02:00
Andrei Betlen	77c9f496b0	Merge branch 'main' into v0.2-wip	2023-07-24 13:19:54 -04:00
Andrei Betlen	401309d11c	Revert "Merge pull request #521 from bretello/main" This reverts commit `07f0f3a386`, reversing changes made to `d8a3ddbb1c`.	2023-07-24 13:11:10 -04:00
Andrei	07f0f3a386	Merge pull request #521 from bretello/main raise exception when `llama_load_model_from_file` fails	2023-07-24 13:09:28 -04:00
Andrei Betlen	d8a3ddbb1c	Update llama.cpp	2023-07-24 13:08:06 -04:00
Andrei Betlen	985d559971	Update llama.cpp	2023-07-24 13:04:34 -04:00
bretello	8be7d67f7e	raise exception when `llama_load_model_from_file` fails	2023-07-24 14:42:37 +02:00
Andrei Betlen	436036aa67	Merge branch 'main' into v0.2-wip	2023-07-21 12:42:38 -04:00
Andrei Betlen	b83728ad1e	Update llama.cpp	2023-07-21 12:33:27 -04:00
Andrei Betlen	0538ba1dab	Merge branch 'main' into v0.2-wip	2023-07-20 19:06:26 -04:00
Andrei Betlen	01435da740	Update llama.cpp	2023-07-20 18:54:25 -04:00
Andrei Betlen	28a111704b	Fix compatibility with older python versions	2023-07-20 18:52:10 -04:00
Andrei Betlen	d10ce62714	Revert ctypes argtype change	2023-07-20 18:51:53 -04:00
Andrei	365d9a4367	Merge pull request #481 from c0sogi/main Added `RouteErrorHandler` for server	2023-07-20 17:41:42 -04:00
Vinicius	a8551477f5	Update llama_cpp.py - Fix c_char_p to Array[c_char_p] and c_float to Array[c_float]	2023-07-20 17:29:11 -03:00
Carlos Tejada	0756a2d3fb	Now the last token sent when `stream=True`	2023-07-19 22:47:14 -04:00
Andrei Betlen	0b121a7456	Format	2023-07-19 03:48:27 -04:00
Andrei Betlen	b43917c144	Add functions parameters	2023-07-19 03:48:20 -04:00
Andrei Betlen	19ba9d3845	Use numpy arrays for logits_processors and stopping_criteria. Closes #491	2023-07-18 19:27:41 -04:00
shutup	5ed8bf132f	expose RoPE param to server start	2023-07-18 16:34:36 +08:00
c0sogi	1551ba10bd	Added `RouteErrorHandler` for server	2023-07-16 14:57:39 +09:00
Andrei Betlen	8ab098e49d	Re-order Llama class params	2023-07-15 15:35:08 -04:00
Andrei Betlen	e4f9db37db	Fix context_params struct layout	2023-07-15 15:34:55 -04:00
Andrei Betlen	f0797a6054	Merge branch main into custom_rope	2023-07-15 15:11:01 -04:00
randoentity	3f8f276f9f	Add bindings for custom_rope	2023-07-10 17:37:46 +02:00
Andrei Betlen	a86bfdf0a5	bugfix: truncate completion max_tokens to fit context length by default	2023-07-09 18:13:29 -04:00
Andrei Betlen	6f70cc4b7d	bugfix: pydantic settings missing / changed fields	2023-07-09 18:03:31 -04:00
Andrei	5d756de314	Merge branch 'main' into add_unlimited_max_tokens	2023-07-08 02:37:38 -04:00
Andrei	b8e0bed295	Merge pull request #453 from wu-qing-157/main Fix incorrect token_logprobs (due to indexing after sorting)	2023-07-08 02:31:52 -04:00
Andrei Betlen	d6e6aad927	bugfix: fix compatibility bug with openai api on last token	2023-07-08 00:06:11 -04:00
Andrei Betlen	4f2b5d0b53	Format	2023-07-08 00:05:10 -04:00
Andrei Betlen	34c505edf2	perf: convert pointer to byref	2023-07-07 22:54:07 -04:00
Andrei Betlen	52753b77f5	Upgrade fastapi to 0.100.0 and pydantic v2	2023-07-07 21:38:46 -04:00
Andrei Betlen	11eae75211	perf: avoid allocating new buffers during sampling	2023-07-07 19:28:53 -04:00
Andrei Betlen	a14d8a9b3f	perf: assign to candidates data structure instead	2023-07-07 18:58:43 -04:00
wu-qing-157	9e61661518	fix indexing token_logprobs after sorting	2023-07-07 10:18:49 +00:00
Andrei Betlen	57d8ec3899	Add setting to control request interruption	2023-07-07 03:37:23 -04:00
Andrei Betlen	4c7cdcca00	Add interruptible streaming requests for llama-cpp-python server. Closes #183	2023-07-07 03:04:17 -04:00
Andrei Betlen	98ae4e58a3	Update llama.cpp	2023-07-06 17:57:56 -04:00
Andrei Betlen	b994296c75	Update llama.cpp	2023-07-05 01:00:14 -04:00
Andrei Betlen	c67f786360	Update llama.cpp	2023-06-29 01:08:15 -04:00
Andrei Betlen	e34f4414cf	Hotfix: logits_all bug	2023-06-29 00:57:27 -04:00
Andrei Betlen	a2ede37bd5	Load logits directly into scores buffer	2023-06-29 00:45:46 -04:00
Andrei Betlen	b95b0ffbeb	Use pre-allocated buffers to store input_ids and scores	2023-06-29 00:40:47 -04:00
Andrei Betlen	a5e059c053	Free model when llama is unloaded. Closes #434	2023-06-28 23:58:55 -04:00
Andrei Betlen	3379dc40a1	Merge branch 'main' of github.com:abetlen/llama_cpp_python into main	2023-06-26 08:50:48 -04:00
Andrei Betlen	952228407e	Update llama.cpp	2023-06-26 08:50:38 -04:00
Andrei Betlen	b4a3db3e54	Update type signature	2023-06-26 08:50:30 -04:00
Andrei	5eb4ebb041	Merge branch 'main' into fix-state-pickle	2023-06-26 08:45:02 -04:00
samfundev	d788fb49bf	Only concatenate after all batches are done	2023-06-24 15:51:46 -04:00
Andrei	877ca6d016	Merge branch 'main' into fix-state-pickle	2023-06-23 15:13:07 -04:00
Alexey	282698b6d3	server: pass seed param from command line to llama	2023-06-23 00:19:24 +04:00
Andrei Betlen	e37798777e	Update llama.cpp	2023-06-20 11:25:10 -04:00
Andrei Betlen	d410f12fae	Update docs. Closes #386	2023-06-17 13:38:48 -04:00
Andrei Betlen	9f528f4715	Merge branch 'main' of github.com:abetlen/llama_cpp_python into main	2023-06-17 13:37:17 -04:00
Andrei Betlen	d7153abcf8	Update llama.cpp	2023-06-16 23:11:14 -04:00
imaprogrammer	fd9f294b3a	Update llama.py: Added how many input tokens in ValueError exception	2023-06-16 14:11:57 +05:30
Andrei Betlen	1e20be6d0c	Add low_vram to server settings	2023-06-14 22:13:42 -04:00
Andrei Betlen	44b83cada5	Add low_vram parameter	2023-06-14 22:12:33 -04:00
Andrei Betlen	f7c5cfaf50	Format server options	2023-06-14 22:08:28 -04:00
Andrei Betlen	9c41a3e990	Merge branch 'main' of github.com:abetlen/llama_cpp_python into main	2023-06-14 21:50:43 -04:00
Andrei	f568baeef1	Merge pull request #351 from player1537-forks/th/add-logits-bias-parameter Add support for `logit_bias` and `logit_bias_type` parameters	2023-06-14 21:49:56 -04:00
Andrei Betlen	f27393ab7e	Add additional verbose logs for cache	2023-06-14 21:46:48 -04:00
Andrei Betlen	4cefb70cd0	Merge branch 'main' of github.com:abetlen/llama_cpp_python into main	2023-06-14 21:40:19 -04:00
Andrei Betlen	715f98c591	Update llama.cpp	2023-06-14 21:40:13 -04:00
Okabintaro	10b0cb727b	fix: Make LLamaState pickable for disk cache I fixed the issue by making the saved state a bytes object instead of the ctypes one which can't be pickled.	2023-06-13 12:03:31 +02:00
Gabor	3129a0e7e5	correction to add back environment variable support <3 docker	2023-06-11 01:11:24 +01:00
Gabor	3ea31930e5	fixes abetlen/llama-cpp-python #358	2023-06-11 00:58:08 +01:00
Andrei Betlen	21acd7901f	Re-enable cache	2023-06-10 12:22:31 -04:00
Andrei Betlen	6639371407	Update llama.cpp	2023-06-10 12:17:38 -04:00
Tanner Hobson	eb7645b3ba	Add support for logit_bias and logit_bias_type parameters	2023-06-09 13:13:08 -04:00

... 4 5 6 7 8 ...

813 commits