Commit graph

628 commits

Author SHA1 Message Date
Lucas Doyle
6d8db9d017 tests: simple test for server module 2023-04-29 11:42:20 -07:00
Lucas Doyle
468377b0e2 llama_cpp server: app is now importable, still runnable as a module 2023-04-29 11:41:25 -07:00
Andrei
755f9fa455
Merge pull request #118 from SagsMug/main
Fix UnicodeDecodeError permanently
2023-04-29 07:19:01 -04:00
Mug
18a0c10032 Remove excessive errors="ignore" and add utf8 test 2023-04-29 12:19:22 +02:00
Andrei Betlen
ea0faabae1 Update llama.cpp 2023-04-28 15:32:43 -04:00
Mug
b7d14efc8b Python weirdness 2023-04-28 13:20:31 +02:00
Mug
eed61289b6 Dont detect off tokens, detect off detokenized utf8 2023-04-28 13:16:18 +02:00
Mug
3a98747026 One day, i'll fix off by 1 errors permanently too 2023-04-28 12:54:28 +02:00
Mug
c39547a986 Detect multi-byte responses and wait 2023-04-28 12:50:30 +02:00
Andrei Betlen
9339929f56 Update llama.cpp 2023-04-26 20:00:54 -04:00
Mug
5f81400fcb Also ignore errors on input prompts 2023-04-26 14:45:51 +02:00
Mug
be2c961bc9 Merge branch 'main' of https://github.com/abetlen/llama-cpp-python 2023-04-26 14:38:09 +02:00
Mug
c4a8491d42 Fix decode errors permanently 2023-04-26 14:37:06 +02:00
Andrei Betlen
cbd26fdcc1 Update llama.cpp 2023-04-25 19:03:41 -04:00
Andrei Betlen
3cab3ef4cb Update n_batch for server 2023-04-25 09:11:32 -04:00
Andrei Betlen
cc706fb944 Add ctx check and re-order __init__. Closes #112 2023-04-25 09:00:53 -04:00
Andrei Betlen
d484c5634e Bugfix: Check cache keys as prefix to prompt tokens 2023-04-24 22:18:54 -04:00
Andrei Betlen
cbe95bbb75 Add cache implementation using llama state 2023-04-24 19:54:41 -04:00
Andrei Betlen
2c359a28ff Merge branch 'main' of github.com:abetlen/llama_cpp_python into main 2023-04-24 17:51:27 -04:00
Andrei Betlen
197cf80601 Add save/load state api for Llama class 2023-04-24 17:51:25 -04:00
Andrei Betlen
86f8e5ad91 Refactor internal state for Llama class 2023-04-24 15:47:54 -04:00
Andrei
f37456133a
Merge pull request #108 from eiery/main
Update n_batch default to 512 to match upstream llama.cpp
2023-04-24 13:48:09 -04:00
Andrei Betlen
02cf881317 Update llama.cpp 2023-04-24 09:30:10 -04:00
eiery
aa12d8a81f
Update llama.py
update n_batch default to 512 to match upstream llama.cpp
2023-04-23 20:56:40 -04:00
Andrei Betlen
7230599593 Disable mmap when applying lora weights. Closes #107 2023-04-23 14:53:17 -04:00
Andrei Betlen
e99caedbbd Update llama.cpp 2023-04-22 19:50:28 -04:00
Andrei Betlen
1eb130a6b2 Update llama.cpp 2023-04-21 17:40:27 -04:00
Andrei Betlen
e4647c75ec Add use_mmap flag to server 2023-04-19 15:57:46 -04:00
Andrei Betlen
0df4d69c20 If lora base is not set avoid re-loading the model by passing NULL 2023-04-18 23:45:25 -04:00
Andrei Betlen
95c0dc134e Update type signature to allow for null pointer to be passed. 2023-04-18 23:44:46 -04:00
Andrei Betlen
453e517fd5 Add seperate lora_base path for applying LoRA to quantized models using original unquantized model weights. 2023-04-18 10:20:46 -04:00
Andrei Betlen
eb7f278cc6 Add lora_path parameter to Llama model 2023-04-18 01:43:44 -04:00
Andrei Betlen
35abf89552 Add bindings for LoRA adapters. Closes #88 2023-04-18 01:30:04 -04:00
Andrei Betlen
89856ef00d Bugfix: only eval new tokens 2023-04-15 17:32:53 -04:00
Andrei Betlen
92c077136d Add experimental cache 2023-04-15 12:03:09 -04:00
Andrei Betlen
a6372a7ae5 Update stop sequences for chat 2023-04-15 12:02:48 -04:00
Andrei Betlen
83b2be6dc4 Update chat parameters 2023-04-15 11:58:43 -04:00
Andrei Betlen
62087514c6 Update chat prompt 2023-04-15 11:58:19 -04:00
Andrei Betlen
02f9fb82fb Bugfix 2023-04-15 11:39:52 -04:00
Andrei Betlen
3cd67c7bd7 Add type annotations 2023-04-15 11:39:21 -04:00
Andrei Betlen
d7de0e8014 Bugfix 2023-04-15 00:08:04 -04:00
Andrei Betlen
e90e122f2a Use clear 2023-04-14 23:33:18 -04:00
Andrei Betlen
ac7068a469 Track generated tokens internally 2023-04-14 23:33:00 -04:00
Andrei Betlen
6e298d8fca Set kv cache size to f16 by default 2023-04-14 22:21:19 -04:00
Andrei Betlen
6c7cec0c65 Fix completion request 2023-04-14 10:01:15 -04:00
Andrei Betlen
6153baab2d Clean up logprobs implementation 2023-04-14 09:59:33 -04:00
Andrei Betlen
26cc4ee029 Fix signature for stop parameter 2023-04-14 09:59:08 -04:00
Andrei Betlen
6595ad84bf Add field to disable reseting between generations 2023-04-13 00:28:00 -04:00
Andrei Betlen
22fa5a621f Revert "Deprecate generate method"
This reverts commit 6cf5876538.
2023-04-13 00:19:55 -04:00
Andrei Betlen
4f5f99ef2a Formatting 2023-04-12 22:40:12 -04:00
Andrei Betlen
0daf16defc Enable logprobs on completion endpoint 2023-04-12 19:08:11 -04:00
Andrei Betlen
19598ac4e8 Fix threading bug. Closes #62 2023-04-12 19:07:53 -04:00
Andrei Betlen
005c78d26c Update llama.cpp 2023-04-12 14:29:00 -04:00
Andrei Betlen
c854c2564b Don't serialize stateful parameters 2023-04-12 14:07:14 -04:00
Andrei Betlen
2f9b649005 Style fix 2023-04-12 14:06:22 -04:00
Andrei Betlen
6cf5876538 Deprecate generate method 2023-04-12 14:06:04 -04:00
Andrei Betlen
b3805bb9cc Implement logprobs parameter for text completion. Closes #2 2023-04-12 14:05:11 -04:00
Andrei Betlen
9f1e565594 Update llama.cpp 2023-04-11 11:59:03 -04:00
Andrei Betlen
213cc5c340 Remove async from function signature to avoid blocking the server 2023-04-11 11:54:31 -04:00
jm12138
90e1021154 Add unlimited max_tokens 2023-04-10 15:56:05 +00:00
Mug
2559e5af9b Changed the environment variable name into "LLAMA_CPP_LIB" 2023-04-10 17:27:17 +02:00
Mug
ee71ce8ab7 Make windows users happy (hopefully) 2023-04-10 17:12:25 +02:00
Mug
cf339c9b3c Better custom library debugging 2023-04-10 17:06:58 +02:00
Mug
4132293d2d Merge branch 'main' of https://github.com/abetlen/llama-cpp-python into local-lib 2023-04-10 17:00:42 +02:00
Mug
76131d5bb8 Use environment variable for library override 2023-04-10 17:00:35 +02:00
Andrei Betlen
1f67ad2a0b Add use_mmap option 2023-04-10 02:11:35 -04:00
Andrei Betlen
c3c2623e8b Update llama.cpp 2023-04-09 22:01:33 -04:00
Andrei Betlen
314ce7d1cc Fix cpu count default 2023-04-08 19:54:04 -04:00
Andrei Betlen
3fbc06361f Formatting 2023-04-08 16:01:45 -04:00
Andrei Betlen
0067c1a588 Formatting 2023-04-08 16:01:18 -04:00
Andrei Betlen
38f442deb0 Bugfix: Wrong size of embeddings. Closes #47 2023-04-08 15:05:33 -04:00
Andrei Betlen
ae3e9c3d6f Update shared library extension for macos 2023-04-08 02:45:21 -04:00
Andrei Betlen
da539cc2ee Safer calculation of default n_threads 2023-04-06 21:22:19 -04:00
Andrei Betlen
930db37dd2 Merge branch 'main' of github.com:abetlen/llama_cpp_python into main 2023-04-06 21:07:38 -04:00
Andrei Betlen
55279b679d Handle prompt list 2023-04-06 21:07:35 -04:00
MillionthOdin16
c283edd7f2 Set n_batch to default values and reduce thread count:
Change batch size to the llama.cpp default of 8. I've seen issues in llama.cpp where batch size affects quality of generations. (It shouldn't) But in case that's still an issue I changed to default.

Set auto-determined num of threads to 1/2 system count. ggml will sometimes lock cores at 100% while doing nothing. This is being addressed, but can cause bad experience for user if pegged at 100%
2023-04-05 18:17:29 -04:00
MillionthOdin16
76a82babef Set n_batch to the default value of 8. I think this is leftover from when n_ctx was missing and n_batch was 2048. 2023-04-05 17:44:53 -04:00
Andrei Betlen
44448fb3a8 Add server as a subpackage 2023-04-05 16:23:25 -04:00
Mug
e3ea354547 Allow local llama library usage 2023-04-05 14:23:01 +02:00
Andrei Betlen
e96a5c5722 Make Llama instance pickleable. Closes #27 2023-04-05 06:52:17 -04:00
Andrei Betlen
7643f6677d Bugfix for Python3.7 2023-04-05 04:37:33 -04:00
Andrei Betlen
cefc69ea43 Add runtime check to ensure embedding is enabled if trying to generate embeddings 2023-04-05 03:25:37 -04:00
Andrei Betlen
5c50af7462 Remove workaround 2023-04-05 03:25:09 -04:00
Andrei Betlen
51dbcf2693 Bugfix: wrong signature for quantize function 2023-04-04 22:36:59 -04:00
Andrei Betlen
c137789143 Add verbose flag. Closes #19 2023-04-04 13:09:24 -04:00
Andrei Betlen
5075c16fcc Bugfix: n_batch should always be <= n_ctx 2023-04-04 13:08:21 -04:00
Andrei Betlen
caf3c0362b Add return type for default __call__ method 2023-04-03 20:26:08 -04:00
Andrei Betlen
4aa349d777 Add docstring for create_chat_completion 2023-04-03 20:24:20 -04:00
Andrei Betlen
7fedf16531 Add support for chat completion 2023-04-03 20:12:44 -04:00
Andrei Betlen
3dec778c90 Update to more sensible return signature 2023-04-03 20:12:14 -04:00
Andrei Betlen
ae004eb69e Fix #16 2023-04-03 18:46:19 -04:00
MillionthOdin16
a0758f0077
Update llama_cpp.py with PR requests
lib_base_name and load_shared_library
to 
_lib_base_name and _load_shared_library
2023-04-03 13:06:50 -04:00
MillionthOdin16
a40476e299
Update llama_cpp.py
Make shared library code more robust with some platform specific functionality and more descriptive errors when failures occur
2023-04-02 21:50:13 -04:00
Andrei Betlen
1ed8cd023d Update llama_cpp and add kv_cache api support 2023-04-02 13:33:49 -04:00
Andrei Betlen
4f509b963e Bugfix: Stop sequences and missing max_tokens check 2023-04-02 03:59:19 -04:00
Andrei Betlen
353e18a781 Move workaround to new sample method 2023-04-02 00:06:34 -04:00
Andrei Betlen
a4a1bbeaa9 Update api to allow for easier interactive mode 2023-04-02 00:02:47 -04:00
Andrei Betlen
eef627c09c Fix example documentation 2023-04-01 17:39:35 -04:00
Andrei Betlen
1e4346307c Add documentation for generate method 2023-04-01 17:36:30 -04:00
Andrei Betlen
67c70cc8eb Add static methods for beginning and end of sequence tokens. 2023-04-01 17:29:30 -04:00
Andrei Betlen
318eae237e Update high-level api 2023-04-01 13:01:27 -04:00
Andrei Betlen
69e7d9f60e Add type definitions 2023-04-01 12:59:58 -04:00
Andrei Betlen
49c8df369a Fix type signature of token_to_str 2023-03-31 03:25:12 -04:00
Andrei Betlen
670d390001 Fix ctypes typing issue for Arrays 2023-03-31 03:20:15 -04:00
Andrei Betlen
1545b22727 Fix array type signatures 2023-03-31 02:08:20 -04:00
Andrei Betlen
c928e0afc8 Formatting 2023-03-31 00:00:27 -04:00
Andrei Betlen
8908f4614c Update llama.cpp 2023-03-28 21:10:23 -04:00
Andrei Betlen
70b8a1ef75 Add support to get embeddings from high-level api. Closes #4 2023-03-28 04:59:54 -04:00
Andrei Betlen
3dbb3fd3f6 Add support for stream parameter. Closes #1 2023-03-28 04:03:57 -04:00
Andrei Betlen
30fc0f3866 Extract generate method 2023-03-28 02:42:22 -04:00
Andrei Betlen
1c823f6d0f Refactor Llama class and add tokenize / detokenize methods Closes #3 2023-03-28 01:45:37 -04:00
Andrei Betlen
8ae3beda9c Update Llama to add params 2023-03-25 16:26:23 -04:00
Andrei Betlen
4525236214 Update llama.cpp 2023-03-25 16:26:03 -04:00
Andrei Betlen
b121b7c05b Update docstring 2023-03-25 12:33:18 -04:00
Andrei Betlen
fa92740a10 Update llama.cpp 2023-03-25 12:12:09 -04:00
Andrei Betlen
df15caa877 Add mkdocs 2023-03-24 18:57:59 -04:00
Andrei Betlen
4da5faa28b Bugfix: cross-platform method to find shared lib 2023-03-24 18:43:29 -04:00
Andrei Betlen
b93675608a Handle errors returned by llama.cpp 2023-03-24 15:47:17 -04:00
Andrei Betlen
7786edb0f9 Black formatting 2023-03-24 14:59:29 -04:00
Andrei Betlen
c784d83131 Update llama.cpp and re-organize low-level api 2023-03-24 14:58:42 -04:00
Andrei Betlen
b9c53b88a1 Use n_ctx provided from actual context not params 2023-03-24 14:58:10 -04:00
Andrei Betlen
2cc499512c Black formatting 2023-03-24 14:35:41 -04:00
Andrei Betlen
e24c581b5a Implement prompt batch processing as in main.cpp 2023-03-24 14:33:38 -04:00
Andrei Betlen
a28cb92d8f Remove model_name param 2023-03-24 04:04:29 -04:00
Andrei Betlen
eec9256a42 Bugfix: avoid decoding partial utf-8 characters 2023-03-23 16:25:13 -04:00
Andrei Betlen
e63ea4dbbc Add support for logprobs 2023-03-23 15:51:05 -04:00
Andrei Betlen
465238b179 Updated package to build with skbuild 2023-03-23 13:54:14 -04:00
Andrei Betlen
79b304c9d4 Initial commit 2023-03-23 05:33:06 -04:00