Mug
c4a8491d42
Fix decode errors permanently
2023-04-26 14:37:06 +02:00
Andrei Betlen
89856ef00d
Bugfix: only eval new tokens
2023-04-15 17:32:53 -04:00
Andrei Betlen
92c077136d
Add experimental cache
2023-04-15 12:03:09 -04:00
Andrei Betlen
a6372a7ae5
Update stop sequences for chat
2023-04-15 12:02:48 -04:00
Andrei Betlen
83b2be6dc4
Update chat parameters
2023-04-15 11:58:43 -04:00
Andrei Betlen
62087514c6
Update chat prompt
2023-04-15 11:58:19 -04:00
Andrei Betlen
02f9fb82fb
Bugfix
2023-04-15 11:39:52 -04:00
Andrei Betlen
3cd67c7bd7
Add type annotations
2023-04-15 11:39:21 -04:00
Andrei Betlen
d7de0e8014
Bugfix
2023-04-15 00:08:04 -04:00
Andrei Betlen
e90e122f2a
Use clear
2023-04-14 23:33:18 -04:00
Andrei Betlen
ac7068a469
Track generated tokens internally
2023-04-14 23:33:00 -04:00
Andrei Betlen
6e298d8fca
Set kv cache size to f16 by default
2023-04-14 22:21:19 -04:00
Andrei Betlen
6c7cec0c65
Fix completion request
2023-04-14 10:01:15 -04:00
Andrei Betlen
6153baab2d
Clean up logprobs implementation
2023-04-14 09:59:33 -04:00
Andrei Betlen
26cc4ee029
Fix signature for stop parameter
2023-04-14 09:59:08 -04:00
Andrei Betlen
6595ad84bf
Add field to disable reseting between generations
2023-04-13 00:28:00 -04:00
Andrei Betlen
22fa5a621f
Revert "Deprecate generate method"
...
This reverts commit 6cf5876538
.
2023-04-13 00:19:55 -04:00
Andrei Betlen
4f5f99ef2a
Formatting
2023-04-12 22:40:12 -04:00
Andrei Betlen
0daf16defc
Enable logprobs on completion endpoint
2023-04-12 19:08:11 -04:00
Andrei Betlen
19598ac4e8
Fix threading bug. Closes #62
2023-04-12 19:07:53 -04:00
Andrei Betlen
005c78d26c
Update llama.cpp
2023-04-12 14:29:00 -04:00
Andrei Betlen
c854c2564b
Don't serialize stateful parameters
2023-04-12 14:07:14 -04:00
Andrei Betlen
2f9b649005
Style fix
2023-04-12 14:06:22 -04:00
Andrei Betlen
6cf5876538
Deprecate generate method
2023-04-12 14:06:04 -04:00
Andrei Betlen
b3805bb9cc
Implement logprobs parameter for text completion. Closes #2
2023-04-12 14:05:11 -04:00
Andrei Betlen
9f1e565594
Update llama.cpp
2023-04-11 11:59:03 -04:00
Andrei Betlen
213cc5c340
Remove async from function signature to avoid blocking the server
2023-04-11 11:54:31 -04:00
Mug
2559e5af9b
Changed the environment variable name into "LLAMA_CPP_LIB"
2023-04-10 17:27:17 +02:00
Mug
ee71ce8ab7
Make windows users happy (hopefully)
2023-04-10 17:12:25 +02:00
Mug
cf339c9b3c
Better custom library debugging
2023-04-10 17:06:58 +02:00
Mug
4132293d2d
Merge branch 'main' of https://github.com/abetlen/llama-cpp-python into local-lib
2023-04-10 17:00:42 +02:00
Mug
76131d5bb8
Use environment variable for library override
2023-04-10 17:00:35 +02:00
Andrei Betlen
1f67ad2a0b
Add use_mmap option
2023-04-10 02:11:35 -04:00
Andrei Betlen
c3c2623e8b
Update llama.cpp
2023-04-09 22:01:33 -04:00
Andrei Betlen
314ce7d1cc
Fix cpu count default
2023-04-08 19:54:04 -04:00
Andrei Betlen
3fbc06361f
Formatting
2023-04-08 16:01:45 -04:00
Andrei Betlen
0067c1a588
Formatting
2023-04-08 16:01:18 -04:00
Andrei Betlen
38f442deb0
Bugfix: Wrong size of embeddings. Closes #47
2023-04-08 15:05:33 -04:00
Andrei Betlen
ae3e9c3d6f
Update shared library extension for macos
2023-04-08 02:45:21 -04:00
Andrei Betlen
da539cc2ee
Safer calculation of default n_threads
2023-04-06 21:22:19 -04:00
Andrei Betlen
930db37dd2
Merge branch 'main' of github.com:abetlen/llama_cpp_python into main
2023-04-06 21:07:38 -04:00
Andrei Betlen
55279b679d
Handle prompt list
2023-04-06 21:07:35 -04:00
MillionthOdin16
c283edd7f2
Set n_batch to default values and reduce thread count:
...
Change batch size to the llama.cpp default of 8. I've seen issues in llama.cpp where batch size affects quality of generations. (It shouldn't) But in case that's still an issue I changed to default.
Set auto-determined num of threads to 1/2 system count. ggml will sometimes lock cores at 100% while doing nothing. This is being addressed, but can cause bad experience for user if pegged at 100%
2023-04-05 18:17:29 -04:00
MillionthOdin16
76a82babef
Set n_batch to the default value of 8. I think this is leftover from when n_ctx was missing and n_batch was 2048.
2023-04-05 17:44:53 -04:00
Andrei Betlen
44448fb3a8
Add server as a subpackage
2023-04-05 16:23:25 -04:00
Mug
e3ea354547
Allow local llama library usage
2023-04-05 14:23:01 +02:00
Andrei Betlen
e96a5c5722
Make Llama instance pickleable. Closes #27
2023-04-05 06:52:17 -04:00
Andrei Betlen
7643f6677d
Bugfix for Python3.7
2023-04-05 04:37:33 -04:00
Andrei Betlen
cefc69ea43
Add runtime check to ensure embedding is enabled if trying to generate embeddings
2023-04-05 03:25:37 -04:00
Andrei Betlen
5c50af7462
Remove workaround
2023-04-05 03:25:09 -04:00