baalajimaestro/llama.cpp

Author	SHA1	Message	Date
Lucas Doyle	6d8db9d017	tests: simple test for server module	2023-04-29 11:42:20 -07:00
Lucas Doyle	468377b0e2	llama_cpp server: app is now importable, still runnable as a module	2023-04-29 11:41:25 -07:00
Andrei	755f9fa455	Merge pull request #118 from SagsMug/main Fix UnicodeDecodeError permanently	2023-04-29 07:19:01 -04:00
Mug	18a0c10032	Remove excessive errors="ignore" and add utf8 test	2023-04-29 12:19:22 +02:00
Andrei Betlen	ea0faabae1	Update llama.cpp	2023-04-28 15:32:43 -04:00
Mug	b7d14efc8b	Python weirdness	2023-04-28 13:20:31 +02:00
Mug	eed61289b6	Dont detect off tokens, detect off detokenized utf8	2023-04-28 13:16:18 +02:00
Mug	3a98747026	One day, i'll fix off by 1 errors permanently too	2023-04-28 12:54:28 +02:00
Mug	c39547a986	Detect multi-byte responses and wait	2023-04-28 12:50:30 +02:00
Andrei Betlen	9339929f56	Update llama.cpp	2023-04-26 20:00:54 -04:00
Mug	5f81400fcb	Also ignore errors on input prompts	2023-04-26 14:45:51 +02:00
Mug	be2c961bc9	Merge branch 'main' of https://github.com/abetlen/llama-cpp-python	2023-04-26 14:38:09 +02:00
Mug	c4a8491d42	Fix decode errors permanently	2023-04-26 14:37:06 +02:00
Andrei Betlen	cbd26fdcc1	Update llama.cpp	2023-04-25 19:03:41 -04:00
Andrei Betlen	3cab3ef4cb	Update n_batch for server	2023-04-25 09:11:32 -04:00
Andrei Betlen	cc706fb944	Add ctx check and re-order __init__. Closes #112	2023-04-25 09:00:53 -04:00
Andrei Betlen	d484c5634e	Bugfix: Check cache keys as prefix to prompt tokens	2023-04-24 22:18:54 -04:00
Andrei Betlen	cbe95bbb75	Add cache implementation using llama state	2023-04-24 19:54:41 -04:00
Andrei Betlen	2c359a28ff	Merge branch 'main' of github.com:abetlen/llama_cpp_python into main	2023-04-24 17:51:27 -04:00
Andrei Betlen	197cf80601	Add save/load state api for Llama class	2023-04-24 17:51:25 -04:00
Andrei Betlen	86f8e5ad91	Refactor internal state for Llama class	2023-04-24 15:47:54 -04:00
Andrei	f37456133a	Merge pull request #108 from eiery/main Update n_batch default to 512 to match upstream llama.cpp	2023-04-24 13:48:09 -04:00
Andrei Betlen	02cf881317	Update llama.cpp	2023-04-24 09:30:10 -04:00
eiery	aa12d8a81f	Update llama.py update n_batch default to 512 to match upstream llama.cpp	2023-04-23 20:56:40 -04:00
Andrei Betlen	7230599593	Disable mmap when applying lora weights. Closes #107	2023-04-23 14:53:17 -04:00
Andrei Betlen	e99caedbbd	Update llama.cpp	2023-04-22 19:50:28 -04:00
Andrei Betlen	1eb130a6b2	Update llama.cpp	2023-04-21 17:40:27 -04:00
Andrei Betlen	e4647c75ec	Add use_mmap flag to server	2023-04-19 15:57:46 -04:00
Andrei Betlen	0df4d69c20	If lora base is not set avoid re-loading the model by passing NULL	2023-04-18 23:45:25 -04:00
Andrei Betlen	95c0dc134e	Update type signature to allow for null pointer to be passed.	2023-04-18 23:44:46 -04:00
Andrei Betlen	453e517fd5	Add seperate lora_base path for applying LoRA to quantized models using original unquantized model weights.	2023-04-18 10:20:46 -04:00
Andrei Betlen	eb7f278cc6	Add lora_path parameter to Llama model	2023-04-18 01:43:44 -04:00
Andrei Betlen	35abf89552	Add bindings for LoRA adapters. Closes #88	2023-04-18 01:30:04 -04:00
Andrei Betlen	89856ef00d	Bugfix: only eval new tokens	2023-04-15 17:32:53 -04:00
Andrei Betlen	92c077136d	Add experimental cache	2023-04-15 12:03:09 -04:00
Andrei Betlen	a6372a7ae5	Update stop sequences for chat	2023-04-15 12:02:48 -04:00
Andrei Betlen	83b2be6dc4	Update chat parameters	2023-04-15 11:58:43 -04:00
Andrei Betlen	62087514c6	Update chat prompt	2023-04-15 11:58:19 -04:00
Andrei Betlen	02f9fb82fb	Bugfix	2023-04-15 11:39:52 -04:00
Andrei Betlen	3cd67c7bd7	Add type annotations	2023-04-15 11:39:21 -04:00
Andrei Betlen	d7de0e8014	Bugfix	2023-04-15 00:08:04 -04:00
Andrei Betlen	e90e122f2a	Use clear	2023-04-14 23:33:18 -04:00
Andrei Betlen	ac7068a469	Track generated tokens internally	2023-04-14 23:33:00 -04:00
Andrei Betlen	6e298d8fca	Set kv cache size to f16 by default	2023-04-14 22:21:19 -04:00
Andrei Betlen	6c7cec0c65	Fix completion request	2023-04-14 10:01:15 -04:00
Andrei Betlen	6153baab2d	Clean up logprobs implementation	2023-04-14 09:59:33 -04:00
Andrei Betlen	26cc4ee029	Fix signature for stop parameter	2023-04-14 09:59:08 -04:00
Andrei Betlen	6595ad84bf	Add field to disable reseting between generations	2023-04-13 00:28:00 -04:00
Andrei Betlen	22fa5a621f	Revert "Deprecate generate method" This reverts commit `6cf5876538`.	2023-04-13 00:19:55 -04:00
Andrei Betlen	4f5f99ef2a	Formatting	2023-04-12 22:40:12 -04:00
Andrei Betlen	0daf16defc	Enable logprobs on completion endpoint	2023-04-12 19:08:11 -04:00
Andrei Betlen	19598ac4e8	Fix threading bug. Closes #62	2023-04-12 19:07:53 -04:00
Andrei Betlen	005c78d26c	Update llama.cpp	2023-04-12 14:29:00 -04:00
Andrei Betlen	c854c2564b	Don't serialize stateful parameters	2023-04-12 14:07:14 -04:00
Andrei Betlen	2f9b649005	Style fix	2023-04-12 14:06:22 -04:00
Andrei Betlen	6cf5876538	Deprecate generate method	2023-04-12 14:06:04 -04:00
Andrei Betlen	b3805bb9cc	Implement logprobs parameter for text completion. Closes #2	2023-04-12 14:05:11 -04:00
Andrei Betlen	9f1e565594	Update llama.cpp	2023-04-11 11:59:03 -04:00
Andrei Betlen	213cc5c340	Remove async from function signature to avoid blocking the server	2023-04-11 11:54:31 -04:00
jm12138	90e1021154	Add unlimited max_tokens	2023-04-10 15:56:05 +00:00
Mug	2559e5af9b	Changed the environment variable name into "LLAMA_CPP_LIB"	2023-04-10 17:27:17 +02:00
Mug	ee71ce8ab7	Make windows users happy (hopefully)	2023-04-10 17:12:25 +02:00
Mug	cf339c9b3c	Better custom library debugging	2023-04-10 17:06:58 +02:00
Mug	4132293d2d	Merge branch 'main' of https://github.com/abetlen/llama-cpp-python into local-lib	2023-04-10 17:00:42 +02:00
Mug	76131d5bb8	Use environment variable for library override	2023-04-10 17:00:35 +02:00
Andrei Betlen	1f67ad2a0b	Add use_mmap option	2023-04-10 02:11:35 -04:00
Andrei Betlen	c3c2623e8b	Update llama.cpp	2023-04-09 22:01:33 -04:00
Andrei Betlen	314ce7d1cc	Fix cpu count default	2023-04-08 19:54:04 -04:00
Andrei Betlen	3fbc06361f	Formatting	2023-04-08 16:01:45 -04:00
Andrei Betlen	0067c1a588	Formatting	2023-04-08 16:01:18 -04:00
Andrei Betlen	38f442deb0	Bugfix: Wrong size of embeddings. Closes #47	2023-04-08 15:05:33 -04:00
Andrei Betlen	ae3e9c3d6f	Update shared library extension for macos	2023-04-08 02:45:21 -04:00
Andrei Betlen	da539cc2ee	Safer calculation of default n_threads	2023-04-06 21:22:19 -04:00
Andrei Betlen	930db37dd2	Merge branch 'main' of github.com:abetlen/llama_cpp_python into main	2023-04-06 21:07:38 -04:00
Andrei Betlen	55279b679d	Handle prompt list	2023-04-06 21:07:35 -04:00
MillionthOdin16	c283edd7f2	Set n_batch to default values and reduce thread count: Change batch size to the llama.cpp default of 8. I've seen issues in llama.cpp where batch size affects quality of generations. (It shouldn't) But in case that's still an issue I changed to default. Set auto-determined num of threads to 1/2 system count. ggml will sometimes lock cores at 100% while doing nothing. This is being addressed, but can cause bad experience for user if pegged at 100%	2023-04-05 18:17:29 -04:00
MillionthOdin16	76a82babef	Set n_batch to the default value of 8. I think this is leftover from when n_ctx was missing and n_batch was 2048.	2023-04-05 17:44:53 -04:00
Andrei Betlen	44448fb3a8	Add server as a subpackage	2023-04-05 16:23:25 -04:00
Mug	e3ea354547	Allow local llama library usage	2023-04-05 14:23:01 +02:00
Andrei Betlen	e96a5c5722	Make Llama instance pickleable. Closes #27	2023-04-05 06:52:17 -04:00
Andrei Betlen	7643f6677d	Bugfix for Python3.7	2023-04-05 04:37:33 -04:00
Andrei Betlen	cefc69ea43	Add runtime check to ensure embedding is enabled if trying to generate embeddings	2023-04-05 03:25:37 -04:00
Andrei Betlen	5c50af7462	Remove workaround	2023-04-05 03:25:09 -04:00
Andrei Betlen	51dbcf2693	Bugfix: wrong signature for quantize function	2023-04-04 22:36:59 -04:00
Andrei Betlen	c137789143	Add verbose flag. Closes #19	2023-04-04 13:09:24 -04:00
Andrei Betlen	5075c16fcc	Bugfix: n_batch should always be <= n_ctx	2023-04-04 13:08:21 -04:00
Andrei Betlen	caf3c0362b	Add return type for default __call__ method	2023-04-03 20:26:08 -04:00
Andrei Betlen	4aa349d777	Add docstring for create_chat_completion	2023-04-03 20:24:20 -04:00
Andrei Betlen	7fedf16531	Add support for chat completion	2023-04-03 20:12:44 -04:00
Andrei Betlen	3dec778c90	Update to more sensible return signature	2023-04-03 20:12:14 -04:00
Andrei Betlen	ae004eb69e	Fix #16	2023-04-03 18:46:19 -04:00
MillionthOdin16	a0758f0077	Update llama_cpp.py with PR requests lib_base_name and load_shared_library to _lib_base_name and _load_shared_library	2023-04-03 13:06:50 -04:00
MillionthOdin16	a40476e299	Update llama_cpp.py Make shared library code more robust with some platform specific functionality and more descriptive errors when failures occur	2023-04-02 21:50:13 -04:00
Andrei Betlen	1ed8cd023d	Update llama_cpp and add kv_cache api support	2023-04-02 13:33:49 -04:00
Andrei Betlen	4f509b963e	Bugfix: Stop sequences and missing max_tokens check	2023-04-02 03:59:19 -04:00
Andrei Betlen	353e18a781	Move workaround to new sample method	2023-04-02 00:06:34 -04:00
Andrei Betlen	a4a1bbeaa9	Update api to allow for easier interactive mode	2023-04-02 00:02:47 -04:00
Andrei Betlen	eef627c09c	Fix example documentation	2023-04-01 17:39:35 -04:00
Andrei Betlen	1e4346307c	Add documentation for generate method	2023-04-01 17:36:30 -04:00
Andrei Betlen	67c70cc8eb	Add static methods for beginning and end of sequence tokens.	2023-04-01 17:29:30 -04:00
Andrei Betlen	318eae237e	Update high-level api	2023-04-01 13:01:27 -04:00
Andrei Betlen	69e7d9f60e	Add type definitions	2023-04-01 12:59:58 -04:00
Andrei Betlen	49c8df369a	Fix type signature of token_to_str	2023-03-31 03:25:12 -04:00
Andrei Betlen	670d390001	Fix ctypes typing issue for Arrays	2023-03-31 03:20:15 -04:00
Andrei Betlen	1545b22727	Fix array type signatures	2023-03-31 02:08:20 -04:00
Andrei Betlen	c928e0afc8	Formatting	2023-03-31 00:00:27 -04:00
Andrei Betlen	8908f4614c	Update llama.cpp	2023-03-28 21:10:23 -04:00
Andrei Betlen	70b8a1ef75	Add support to get embeddings from high-level api. Closes #4	2023-03-28 04:59:54 -04:00
Andrei Betlen	3dbb3fd3f6	Add support for stream parameter. Closes #1	2023-03-28 04:03:57 -04:00
Andrei Betlen	30fc0f3866	Extract generate method	2023-03-28 02:42:22 -04:00
Andrei Betlen	1c823f6d0f	Refactor Llama class and add tokenize / detokenize methods Closes #3	2023-03-28 01:45:37 -04:00
Andrei Betlen	8ae3beda9c	Update Llama to add params	2023-03-25 16:26:23 -04:00
Andrei Betlen	4525236214	Update llama.cpp	2023-03-25 16:26:03 -04:00
Andrei Betlen	b121b7c05b	Update docstring	2023-03-25 12:33:18 -04:00
Andrei Betlen	fa92740a10	Update llama.cpp	2023-03-25 12:12:09 -04:00
Andrei Betlen	df15caa877	Add mkdocs	2023-03-24 18:57:59 -04:00
Andrei Betlen	4da5faa28b	Bugfix: cross-platform method to find shared lib	2023-03-24 18:43:29 -04:00
Andrei Betlen	b93675608a	Handle errors returned by llama.cpp	2023-03-24 15:47:17 -04:00
Andrei Betlen	7786edb0f9	Black formatting	2023-03-24 14:59:29 -04:00
Andrei Betlen	c784d83131	Update llama.cpp and re-organize low-level api	2023-03-24 14:58:42 -04:00
Andrei Betlen	b9c53b88a1	Use n_ctx provided from actual context not params	2023-03-24 14:58:10 -04:00
Andrei Betlen	2cc499512c	Black formatting	2023-03-24 14:35:41 -04:00
Andrei Betlen	e24c581b5a	Implement prompt batch processing as in main.cpp	2023-03-24 14:33:38 -04:00
Andrei Betlen	a28cb92d8f	Remove model_name param	2023-03-24 04:04:29 -04:00
Andrei Betlen	eec9256a42	Bugfix: avoid decoding partial utf-8 characters	2023-03-23 16:25:13 -04:00
Andrei Betlen	e63ea4dbbc	Add support for logprobs	2023-03-23 15:51:05 -04:00
Andrei Betlen	465238b179	Updated package to build with skbuild	2023-03-23 13:54:14 -04:00
Andrei Betlen	79b304c9d4	Initial commit	2023-03-23 05:33:06 -04:00

... 9 10 11 12 13

628 commits