Andrei Betlen
da539cc2ee
Safer calculation of default n_threads
2023-04-06 21:22:19 -04:00
Andrei Betlen
930db37dd2
Merge branch 'main' of github.com:abetlen/llama_cpp_python into main
2023-04-06 21:07:38 -04:00
Andrei Betlen
55279b679d
Handle prompt list
2023-04-06 21:07:35 -04:00
MillionthOdin16
c283edd7f2
Set n_batch to default values and reduce thread count:
...
Change batch size to the llama.cpp default of 8. I've seen issues in llama.cpp where batch size affects quality of generations. (It shouldn't) But in case that's still an issue I changed to default.
Set auto-determined num of threads to 1/2 system count. ggml will sometimes lock cores at 100% while doing nothing. This is being addressed, but can cause bad experience for user if pegged at 100%
2023-04-05 18:17:29 -04:00
MillionthOdin16
76a82babef
Set n_batch to the default value of 8. I think this is leftover from when n_ctx was missing and n_batch was 2048.
2023-04-05 17:44:53 -04:00
Andrei Betlen
44448fb3a8
Add server as a subpackage
2023-04-05 16:23:25 -04:00
Andrei Betlen
e96a5c5722
Make Llama instance pickleable. Closes #27
2023-04-05 06:52:17 -04:00
Andrei Betlen
7643f6677d
Bugfix for Python3.7
2023-04-05 04:37:33 -04:00
Andrei Betlen
cefc69ea43
Add runtime check to ensure embedding is enabled if trying to generate embeddings
2023-04-05 03:25:37 -04:00
Andrei Betlen
5c50af7462
Remove workaround
2023-04-05 03:25:09 -04:00
Andrei Betlen
51dbcf2693
Bugfix: wrong signature for quantize function
2023-04-04 22:36:59 -04:00
Andrei Betlen
c137789143
Add verbose flag. Closes #19
2023-04-04 13:09:24 -04:00
Andrei Betlen
5075c16fcc
Bugfix: n_batch should always be <= n_ctx
2023-04-04 13:08:21 -04:00
Andrei Betlen
caf3c0362b
Add return type for default __call__ method
2023-04-03 20:26:08 -04:00
Andrei Betlen
4aa349d777
Add docstring for create_chat_completion
2023-04-03 20:24:20 -04:00
Andrei Betlen
7fedf16531
Add support for chat completion
2023-04-03 20:12:44 -04:00
Andrei Betlen
3dec778c90
Update to more sensible return signature
2023-04-03 20:12:14 -04:00
Andrei Betlen
ae004eb69e
Fix #16
2023-04-03 18:46:19 -04:00
MillionthOdin16
a0758f0077
Update llama_cpp.py with PR requests
...
lib_base_name and load_shared_library
to
_lib_base_name and _load_shared_library
2023-04-03 13:06:50 -04:00
MillionthOdin16
a40476e299
Update llama_cpp.py
...
Make shared library code more robust with some platform specific functionality and more descriptive errors when failures occur
2023-04-02 21:50:13 -04:00
Andrei Betlen
1ed8cd023d
Update llama_cpp and add kv_cache api support
2023-04-02 13:33:49 -04:00
Andrei Betlen
4f509b963e
Bugfix: Stop sequences and missing max_tokens check
2023-04-02 03:59:19 -04:00
Andrei Betlen
353e18a781
Move workaround to new sample method
2023-04-02 00:06:34 -04:00
Andrei Betlen
a4a1bbeaa9
Update api to allow for easier interactive mode
2023-04-02 00:02:47 -04:00
Andrei Betlen
eef627c09c
Fix example documentation
2023-04-01 17:39:35 -04:00
Andrei Betlen
1e4346307c
Add documentation for generate method
2023-04-01 17:36:30 -04:00
Andrei Betlen
67c70cc8eb
Add static methods for beginning and end of sequence tokens.
2023-04-01 17:29:30 -04:00
Andrei Betlen
318eae237e
Update high-level api
2023-04-01 13:01:27 -04:00
Andrei Betlen
69e7d9f60e
Add type definitions
2023-04-01 12:59:58 -04:00
Andrei Betlen
49c8df369a
Fix type signature of token_to_str
2023-03-31 03:25:12 -04:00
Andrei Betlen
670d390001
Fix ctypes typing issue for Arrays
2023-03-31 03:20:15 -04:00
Andrei Betlen
1545b22727
Fix array type signatures
2023-03-31 02:08:20 -04:00
Andrei Betlen
c928e0afc8
Formatting
2023-03-31 00:00:27 -04:00
Andrei Betlen
8908f4614c
Update llama.cpp
2023-03-28 21:10:23 -04:00
Andrei Betlen
70b8a1ef75
Add support to get embeddings from high-level api. Closes #4
2023-03-28 04:59:54 -04:00
Andrei Betlen
3dbb3fd3f6
Add support for stream parameter. Closes #1
2023-03-28 04:03:57 -04:00
Andrei Betlen
30fc0f3866
Extract generate method
2023-03-28 02:42:22 -04:00
Andrei Betlen
1c823f6d0f
Refactor Llama class and add tokenize / detokenize methods Closes #3
2023-03-28 01:45:37 -04:00
Andrei Betlen
8ae3beda9c
Update Llama to add params
2023-03-25 16:26:23 -04:00
Andrei Betlen
4525236214
Update llama.cpp
2023-03-25 16:26:03 -04:00
Andrei Betlen
b121b7c05b
Update docstring
2023-03-25 12:33:18 -04:00
Andrei Betlen
fa92740a10
Update llama.cpp
2023-03-25 12:12:09 -04:00
Andrei Betlen
df15caa877
Add mkdocs
2023-03-24 18:57:59 -04:00
Andrei Betlen
4da5faa28b
Bugfix: cross-platform method to find shared lib
2023-03-24 18:43:29 -04:00
Andrei Betlen
b93675608a
Handle errors returned by llama.cpp
2023-03-24 15:47:17 -04:00
Andrei Betlen
7786edb0f9
Black formatting
2023-03-24 14:59:29 -04:00
Andrei Betlen
c784d83131
Update llama.cpp and re-organize low-level api
2023-03-24 14:58:42 -04:00
Andrei Betlen
b9c53b88a1
Use n_ctx provided from actual context not params
2023-03-24 14:58:10 -04:00
Andrei Betlen
2cc499512c
Black formatting
2023-03-24 14:35:41 -04:00
Andrei Betlen
e24c581b5a
Implement prompt batch processing as in main.cpp
2023-03-24 14:33:38 -04:00