baalajimaestro/llama.cpp

Author	SHA1	Message	Date
Andrei Betlen	9f1e565594	Update llama.cpp	2023-04-11 11:59:03 -04:00
Andrei Betlen	213cc5c340	Remove async from function signature to avoid blocking the server	2023-04-11 11:54:31 -04:00
Mug	2559e5af9b	Changed the environment variable name into "LLAMA_CPP_LIB"	2023-04-10 17:27:17 +02:00
Mug	ee71ce8ab7	Make windows users happy (hopefully)	2023-04-10 17:12:25 +02:00
Mug	cf339c9b3c	Better custom library debugging	2023-04-10 17:06:58 +02:00
Mug	4132293d2d	Merge branch 'main' of https://github.com/abetlen/llama-cpp-python into local-lib	2023-04-10 17:00:42 +02:00
Mug	76131d5bb8	Use environment variable for library override	2023-04-10 17:00:35 +02:00
Andrei Betlen	1f67ad2a0b	Add use_mmap option	2023-04-10 02:11:35 -04:00
Andrei Betlen	c3c2623e8b	Update llama.cpp	2023-04-09 22:01:33 -04:00
Andrei Betlen	314ce7d1cc	Fix cpu count default	2023-04-08 19:54:04 -04:00
Andrei Betlen	3fbc06361f	Formatting	2023-04-08 16:01:45 -04:00
Andrei Betlen	0067c1a588	Formatting	2023-04-08 16:01:18 -04:00
Andrei Betlen	38f442deb0	Bugfix: Wrong size of embeddings. Closes #47	2023-04-08 15:05:33 -04:00
Andrei Betlen	ae3e9c3d6f	Update shared library extension for macos	2023-04-08 02:45:21 -04:00
Andrei Betlen	da539cc2ee	Safer calculation of default n_threads	2023-04-06 21:22:19 -04:00
Andrei Betlen	930db37dd2	Merge branch 'main' of github.com:abetlen/llama_cpp_python into main	2023-04-06 21:07:38 -04:00
Andrei Betlen	55279b679d	Handle prompt list	2023-04-06 21:07:35 -04:00
MillionthOdin16	c283edd7f2	Set n_batch to default values and reduce thread count: Change batch size to the llama.cpp default of 8. I've seen issues in llama.cpp where batch size affects quality of generations. (It shouldn't) But in case that's still an issue I changed to default. Set auto-determined num of threads to 1/2 system count. ggml will sometimes lock cores at 100% while doing nothing. This is being addressed, but can cause bad experience for user if pegged at 100%	2023-04-05 18:17:29 -04:00
MillionthOdin16	76a82babef	Set n_batch to the default value of 8. I think this is leftover from when n_ctx was missing and n_batch was 2048.	2023-04-05 17:44:53 -04:00
Andrei Betlen	44448fb3a8	Add server as a subpackage	2023-04-05 16:23:25 -04:00
Mug	e3ea354547	Allow local llama library usage	2023-04-05 14:23:01 +02:00
Andrei Betlen	e96a5c5722	Make Llama instance pickleable. Closes #27	2023-04-05 06:52:17 -04:00
Andrei Betlen	7643f6677d	Bugfix for Python3.7	2023-04-05 04:37:33 -04:00
Andrei Betlen	cefc69ea43	Add runtime check to ensure embedding is enabled if trying to generate embeddings	2023-04-05 03:25:37 -04:00
Andrei Betlen	5c50af7462	Remove workaround	2023-04-05 03:25:09 -04:00
Andrei Betlen	51dbcf2693	Bugfix: wrong signature for quantize function	2023-04-04 22:36:59 -04:00
Andrei Betlen	c137789143	Add verbose flag. Closes #19	2023-04-04 13:09:24 -04:00
Andrei Betlen	5075c16fcc	Bugfix: n_batch should always be <= n_ctx	2023-04-04 13:08:21 -04:00
Andrei Betlen	caf3c0362b	Add return type for default __call__ method	2023-04-03 20:26:08 -04:00
Andrei Betlen	4aa349d777	Add docstring for create_chat_completion	2023-04-03 20:24:20 -04:00
Andrei Betlen	7fedf16531	Add support for chat completion	2023-04-03 20:12:44 -04:00
Andrei Betlen	3dec778c90	Update to more sensible return signature	2023-04-03 20:12:14 -04:00
Andrei Betlen	ae004eb69e	Fix #16	2023-04-03 18:46:19 -04:00
MillionthOdin16	a0758f0077	Update llama_cpp.py with PR requests lib_base_name and load_shared_library to _lib_base_name and _load_shared_library	2023-04-03 13:06:50 -04:00
MillionthOdin16	a40476e299	Update llama_cpp.py Make shared library code more robust with some platform specific functionality and more descriptive errors when failures occur	2023-04-02 21:50:13 -04:00
Andrei Betlen	1ed8cd023d	Update llama_cpp and add kv_cache api support	2023-04-02 13:33:49 -04:00
Andrei Betlen	4f509b963e	Bugfix: Stop sequences and missing max_tokens check	2023-04-02 03:59:19 -04:00
Andrei Betlen	353e18a781	Move workaround to new sample method	2023-04-02 00:06:34 -04:00
Andrei Betlen	a4a1bbeaa9	Update api to allow for easier interactive mode	2023-04-02 00:02:47 -04:00
Andrei Betlen	eef627c09c	Fix example documentation	2023-04-01 17:39:35 -04:00
Andrei Betlen	1e4346307c	Add documentation for generate method	2023-04-01 17:36:30 -04:00
Andrei Betlen	67c70cc8eb	Add static methods for beginning and end of sequence tokens.	2023-04-01 17:29:30 -04:00
Andrei Betlen	318eae237e	Update high-level api	2023-04-01 13:01:27 -04:00
Andrei Betlen	69e7d9f60e	Add type definitions	2023-04-01 12:59:58 -04:00
Andrei Betlen	49c8df369a	Fix type signature of token_to_str	2023-03-31 03:25:12 -04:00
Andrei Betlen	670d390001	Fix ctypes typing issue for Arrays	2023-03-31 03:20:15 -04:00
Andrei Betlen	1545b22727	Fix array type signatures	2023-03-31 02:08:20 -04:00
Andrei Betlen	c928e0afc8	Formatting	2023-03-31 00:00:27 -04:00
Andrei Betlen	8908f4614c	Update llama.cpp	2023-03-28 21:10:23 -04:00
Andrei Betlen	70b8a1ef75	Add support to get embeddings from high-level api. Closes #4	2023-03-28 04:59:54 -04:00

1 2

70 commits