Andrei Betlen
5a3413eee3
Update cpu_count
2023-05-07 03:03:57 -04:00
Andrei Betlen
1a00e452ea
Update settings fields and defaults
2023-05-07 02:52:20 -04:00
Andrei Betlen
86753976c4
Revert "llama_cpp server: delete some ignored / unused parameters"
...
This reverts commit b47b9549d5
.
2023-05-07 02:02:34 -04:00
Andrei Betlen
c382d8f86a
Revert "llama_cpp server: mark model as required"
...
This reverts commit e40fcb0575
.
2023-05-07 02:00:22 -04:00
Andrei Betlen
d8fddcce73
Merge branch 'main' of github.com:abetlen/llama_cpp_python into better-server-params-and-fields
2023-05-07 01:54:00 -04:00
Andrei Betlen
7c3743fe5f
Update llama.cpp
2023-05-07 00:12:47 -04:00
Andrei Betlen
bc853e3742
Fix type for eval_logits in LlamaState object
2023-05-06 21:32:50 -04:00
Maximilian Winter
515d9bde7e
Fixed somethings and activated cublas
2023-05-06 23:40:19 +02:00
Maximilian Winter
aa203a0d65
Added mirostat sampling to the high level API.
2023-05-06 22:47:47 +02:00
Andrei Betlen
98bbd1c6a8
Fix eval logits type
2023-05-05 14:23:14 -04:00
Andrei Betlen
b5f3e74627
Add return type annotations for embeddings and logits
2023-05-05 14:22:55 -04:00
Andrei Betlen
3e28e0e50c
Fix: runtime type errors
2023-05-05 14:12:26 -04:00
Andrei Betlen
e24c3d7447
Prefer explicit imports
2023-05-05 14:05:31 -04:00
Andrei Betlen
40501435c1
Fix: types
2023-05-05 14:04:12 -04:00
Andrei Betlen
66e28eb548
Fix temperature bug
2023-05-05 14:00:41 -04:00
Andrei Betlen
6702d2abfd
Fix candidates type
2023-05-05 14:00:30 -04:00
Andrei Betlen
5e7ddfc3d6
Fix llama_cpp types
2023-05-05 13:54:22 -04:00
Andrei Betlen
b6a9a0b6ba
Add types for all low-level api functions
2023-05-05 12:22:27 -04:00
Andrei Betlen
5be0efa5f8
Cache should raise KeyError when key is missing
2023-05-05 12:21:49 -04:00
Andrei Betlen
24fc38754b
Add cli options to server. Closes #37
2023-05-05 12:08:28 -04:00
Andrei Betlen
853dc711cc
Format
2023-05-04 21:58:36 -04:00
Andrei Betlen
97c6372350
Rewind model to longest prefix.
2023-05-04 21:58:27 -04:00
Andrei Betlen
329297fafb
Bugfix: Missing logits_to_logprobs
2023-05-04 12:18:40 -04:00
Lucas Doyle
3008a954c1
Merge branch 'main' of github.com:abetlen/llama-cpp-python into better-server-params-and-fields
2023-05-03 13:10:03 -07:00
Andrei Betlen
9e5b6d675a
Improve logging messages
2023-05-03 10:28:10 -04:00
Andrei Betlen
43f2907e3a
Support smaller state sizes
2023-05-03 09:33:50 -04:00
Andrei Betlen
1d47cce222
Update llama.cpp
2023-05-03 09:33:30 -04:00
Lucas Doyle
b9098b0ef7
llama_cpp server: prompt is a string
...
Not sure why this union type was here but taking a look at llama.py, prompt is only ever processed as a string for completion
This was breaking types when generating an openapi client
2023-05-02 14:47:07 -07:00
Matt Hoffner
f97ff3c5bb
Update llama_cpp.py
2023-05-01 20:40:06 -07:00
Andrei
7ab08b8d10
Merge branch 'main' into better-server-params-and-fields
2023-05-01 22:45:57 -04:00
Andrei Betlen
9eafc4c49a
Refactor server to use factory
2023-05-01 22:38:46 -04:00
Andrei Betlen
dd9ad1c759
Formatting
2023-05-01 21:51:16 -04:00
Lucas Doyle
dbbfc4ba2f
llama_cpp server: fix to ChatCompletionRequestMessage
...
When I generate a client, it breaks because it fails to process the schema of ChatCompletionRequestMessage
These fix that:
- I think `Union[Literal["user"], Literal["channel"], ...]` is the same as Literal["user", "channel", ...]
- Turns out default value `Literal["user"]` isn't JSON serializable, so replace with "user"
2023-05-01 15:38:19 -07:00
Lucas Doyle
fa2a61e065
llama_cpp server: fields for the embedding endpoint
2023-05-01 15:38:19 -07:00
Lucas Doyle
8dcbf65a45
llama_cpp server: define fields for chat completions
...
Slight refactor for common fields shared between completion and chat completion
2023-05-01 15:38:19 -07:00
Lucas Doyle
978b6daf93
llama_cpp server: add some more information to fields for completions
2023-05-01 15:38:19 -07:00
Lucas Doyle
a5aa6c1478
llama_cpp server: add missing top_k param to CreateChatCompletionRequest
...
`llama.create_chat_completion` definitely has a `top_k` argument, but its missing from `CreateChatCompletionRequest`. decision: add it
2023-05-01 15:38:19 -07:00
Lucas Doyle
1e42913599
llama_cpp server: move logprobs to supported
...
I think this is actually supported (its in the arguments of `LLama.__call__`, which is how the completion is invoked). decision: mark as supported
2023-05-01 15:38:19 -07:00
Lucas Doyle
b47b9549d5
llama_cpp server: delete some ignored / unused parameters
...
`n`, `presence_penalty`, `frequency_penalty`, `best_of`, `logit_bias`, `user`: not supported, excluded from the calls into llama. decision: delete it
2023-05-01 15:38:19 -07:00
Lucas Doyle
e40fcb0575
llama_cpp server: mark model as required
...
`model` is ignored, but currently marked "optional"... on the one hand could mark "required" to make it explicit in case the server supports multiple llama's at the same time, but also could delete it since its ignored. decision: mark it required for the sake of openai api compatibility.
I think out of all parameters, `model` is probably the most important one for people to keep using even if its ignored for now.
2023-05-01 15:38:19 -07:00
Andrei Betlen
b6747f722e
Fix logprob calculation. Fixes #134
2023-05-01 17:45:08 -04:00
Andrei Betlen
9ff9cdd7fc
Fix import error
2023-05-01 15:11:15 -04:00
Andrei Betlen
350a1769e1
Update sampling api
2023-05-01 14:47:55 -04:00
Andrei Betlen
7837c3fdc7
Fix return types and import comments
2023-05-01 14:02:06 -04:00
Andrei Betlen
ccf1ed54ae
Merge branch 'main' of github.com:abetlen/llama_cpp_python into main
2023-05-01 11:35:14 -04:00
Andrei Betlen
80184a286c
Update llama.cpp
2023-05-01 10:44:28 -04:00
Lucas Doyle
efe8e6f879
llama_cpp server: slight refactor to init_llama function
...
Define an init_llama function that starts llama with supplied settings instead of just doing it in the global context of app.py
This allows the test to be less brittle by not needing to mess with os.environ, then importing the app
2023-04-29 11:42:23 -07:00
Lucas Doyle
6d8db9d017
tests: simple test for server module
2023-04-29 11:42:20 -07:00
Lucas Doyle
468377b0e2
llama_cpp server: app is now importable, still runnable as a module
2023-04-29 11:41:25 -07:00
Andrei
755f9fa455
Merge pull request #118 from SagsMug/main
...
Fix UnicodeDecodeError permanently
2023-04-29 07:19:01 -04:00
Mug
18a0c10032
Remove excessive errors="ignore" and add utf8 test
2023-04-29 12:19:22 +02:00
Andrei Betlen
ea0faabae1
Update llama.cpp
2023-04-28 15:32:43 -04:00
Mug
b7d14efc8b
Python weirdness
2023-04-28 13:20:31 +02:00
Mug
eed61289b6
Dont detect off tokens, detect off detokenized utf8
2023-04-28 13:16:18 +02:00
Mug
3a98747026
One day, i'll fix off by 1 errors permanently too
2023-04-28 12:54:28 +02:00
Mug
c39547a986
Detect multi-byte responses and wait
2023-04-28 12:50:30 +02:00
Andrei Betlen
9339929f56
Update llama.cpp
2023-04-26 20:00:54 -04:00
Mug
5f81400fcb
Also ignore errors on input prompts
2023-04-26 14:45:51 +02:00
Mug
be2c961bc9
Merge branch 'main' of https://github.com/abetlen/llama-cpp-python
2023-04-26 14:38:09 +02:00
Mug
c4a8491d42
Fix decode errors permanently
2023-04-26 14:37:06 +02:00
Andrei Betlen
cbd26fdcc1
Update llama.cpp
2023-04-25 19:03:41 -04:00
Andrei Betlen
3cab3ef4cb
Update n_batch for server
2023-04-25 09:11:32 -04:00
Andrei Betlen
cc706fb944
Add ctx check and re-order __init__. Closes #112
2023-04-25 09:00:53 -04:00
Andrei Betlen
d484c5634e
Bugfix: Check cache keys as prefix to prompt tokens
2023-04-24 22:18:54 -04:00
Andrei Betlen
cbe95bbb75
Add cache implementation using llama state
2023-04-24 19:54:41 -04:00
Andrei Betlen
2c359a28ff
Merge branch 'main' of github.com:abetlen/llama_cpp_python into main
2023-04-24 17:51:27 -04:00
Andrei Betlen
197cf80601
Add save/load state api for Llama class
2023-04-24 17:51:25 -04:00
Andrei Betlen
86f8e5ad91
Refactor internal state for Llama class
2023-04-24 15:47:54 -04:00
Andrei
f37456133a
Merge pull request #108 from eiery/main
...
Update n_batch default to 512 to match upstream llama.cpp
2023-04-24 13:48:09 -04:00
Andrei Betlen
02cf881317
Update llama.cpp
2023-04-24 09:30:10 -04:00
eiery
aa12d8a81f
Update llama.py
...
update n_batch default to 512 to match upstream llama.cpp
2023-04-23 20:56:40 -04:00
Andrei Betlen
7230599593
Disable mmap when applying lora weights. Closes #107
2023-04-23 14:53:17 -04:00
Andrei Betlen
e99caedbbd
Update llama.cpp
2023-04-22 19:50:28 -04:00
Andrei Betlen
1eb130a6b2
Update llama.cpp
2023-04-21 17:40:27 -04:00
Andrei Betlen
e4647c75ec
Add use_mmap flag to server
2023-04-19 15:57:46 -04:00
Andrei Betlen
0df4d69c20
If lora base is not set avoid re-loading the model by passing NULL
2023-04-18 23:45:25 -04:00
Andrei Betlen
95c0dc134e
Update type signature to allow for null pointer to be passed.
2023-04-18 23:44:46 -04:00
Andrei Betlen
453e517fd5
Add seperate lora_base path for applying LoRA to quantized models using original unquantized model weights.
2023-04-18 10:20:46 -04:00
Andrei Betlen
eb7f278cc6
Add lora_path parameter to Llama model
2023-04-18 01:43:44 -04:00
Andrei Betlen
35abf89552
Add bindings for LoRA adapters. Closes #88
2023-04-18 01:30:04 -04:00
Andrei Betlen
89856ef00d
Bugfix: only eval new tokens
2023-04-15 17:32:53 -04:00
Andrei Betlen
92c077136d
Add experimental cache
2023-04-15 12:03:09 -04:00
Andrei Betlen
a6372a7ae5
Update stop sequences for chat
2023-04-15 12:02:48 -04:00
Andrei Betlen
83b2be6dc4
Update chat parameters
2023-04-15 11:58:43 -04:00
Andrei Betlen
62087514c6
Update chat prompt
2023-04-15 11:58:19 -04:00
Andrei Betlen
02f9fb82fb
Bugfix
2023-04-15 11:39:52 -04:00
Andrei Betlen
3cd67c7bd7
Add type annotations
2023-04-15 11:39:21 -04:00
Andrei Betlen
d7de0e8014
Bugfix
2023-04-15 00:08:04 -04:00
Andrei Betlen
e90e122f2a
Use clear
2023-04-14 23:33:18 -04:00
Andrei Betlen
ac7068a469
Track generated tokens internally
2023-04-14 23:33:00 -04:00
Andrei Betlen
6e298d8fca
Set kv cache size to f16 by default
2023-04-14 22:21:19 -04:00
Andrei Betlen
6c7cec0c65
Fix completion request
2023-04-14 10:01:15 -04:00
Andrei Betlen
6153baab2d
Clean up logprobs implementation
2023-04-14 09:59:33 -04:00
Andrei Betlen
26cc4ee029
Fix signature for stop parameter
2023-04-14 09:59:08 -04:00
Andrei Betlen
6595ad84bf
Add field to disable reseting between generations
2023-04-13 00:28:00 -04:00
Andrei Betlen
22fa5a621f
Revert "Deprecate generate method"
...
This reverts commit 6cf5876538
.
2023-04-13 00:19:55 -04:00
Andrei Betlen
4f5f99ef2a
Formatting
2023-04-12 22:40:12 -04:00
Andrei Betlen
0daf16defc
Enable logprobs on completion endpoint
2023-04-12 19:08:11 -04:00
Andrei Betlen
19598ac4e8
Fix threading bug. Closes #62
2023-04-12 19:07:53 -04:00
Andrei Betlen
005c78d26c
Update llama.cpp
2023-04-12 14:29:00 -04:00
Andrei Betlen
c854c2564b
Don't serialize stateful parameters
2023-04-12 14:07:14 -04:00
Andrei Betlen
2f9b649005
Style fix
2023-04-12 14:06:22 -04:00
Andrei Betlen
6cf5876538
Deprecate generate method
2023-04-12 14:06:04 -04:00
Andrei Betlen
b3805bb9cc
Implement logprobs parameter for text completion. Closes #2
2023-04-12 14:05:11 -04:00
Andrei Betlen
9f1e565594
Update llama.cpp
2023-04-11 11:59:03 -04:00
Andrei Betlen
213cc5c340
Remove async from function signature to avoid blocking the server
2023-04-11 11:54:31 -04:00
Mug
2559e5af9b
Changed the environment variable name into "LLAMA_CPP_LIB"
2023-04-10 17:27:17 +02:00
Mug
ee71ce8ab7
Make windows users happy (hopefully)
2023-04-10 17:12:25 +02:00
Mug
cf339c9b3c
Better custom library debugging
2023-04-10 17:06:58 +02:00
Mug
4132293d2d
Merge branch 'main' of https://github.com/abetlen/llama-cpp-python into local-lib
2023-04-10 17:00:42 +02:00
Mug
76131d5bb8
Use environment variable for library override
2023-04-10 17:00:35 +02:00
Andrei Betlen
1f67ad2a0b
Add use_mmap option
2023-04-10 02:11:35 -04:00
Andrei Betlen
c3c2623e8b
Update llama.cpp
2023-04-09 22:01:33 -04:00
Andrei Betlen
314ce7d1cc
Fix cpu count default
2023-04-08 19:54:04 -04:00
Andrei Betlen
3fbc06361f
Formatting
2023-04-08 16:01:45 -04:00
Andrei Betlen
0067c1a588
Formatting
2023-04-08 16:01:18 -04:00
Andrei Betlen
38f442deb0
Bugfix: Wrong size of embeddings. Closes #47
2023-04-08 15:05:33 -04:00
Andrei Betlen
ae3e9c3d6f
Update shared library extension for macos
2023-04-08 02:45:21 -04:00
Andrei Betlen
da539cc2ee
Safer calculation of default n_threads
2023-04-06 21:22:19 -04:00
Andrei Betlen
930db37dd2
Merge branch 'main' of github.com:abetlen/llama_cpp_python into main
2023-04-06 21:07:38 -04:00
Andrei Betlen
55279b679d
Handle prompt list
2023-04-06 21:07:35 -04:00
MillionthOdin16
c283edd7f2
Set n_batch to default values and reduce thread count:
...
Change batch size to the llama.cpp default of 8. I've seen issues in llama.cpp where batch size affects quality of generations. (It shouldn't) But in case that's still an issue I changed to default.
Set auto-determined num of threads to 1/2 system count. ggml will sometimes lock cores at 100% while doing nothing. This is being addressed, but can cause bad experience for user if pegged at 100%
2023-04-05 18:17:29 -04:00
MillionthOdin16
76a82babef
Set n_batch to the default value of 8. I think this is leftover from when n_ctx was missing and n_batch was 2048.
2023-04-05 17:44:53 -04:00
Andrei Betlen
44448fb3a8
Add server as a subpackage
2023-04-05 16:23:25 -04:00
Mug
e3ea354547
Allow local llama library usage
2023-04-05 14:23:01 +02:00
Andrei Betlen
e96a5c5722
Make Llama instance pickleable. Closes #27
2023-04-05 06:52:17 -04:00
Andrei Betlen
7643f6677d
Bugfix for Python3.7
2023-04-05 04:37:33 -04:00
Andrei Betlen
cefc69ea43
Add runtime check to ensure embedding is enabled if trying to generate embeddings
2023-04-05 03:25:37 -04:00
Andrei Betlen
5c50af7462
Remove workaround
2023-04-05 03:25:09 -04:00
Andrei Betlen
51dbcf2693
Bugfix: wrong signature for quantize function
2023-04-04 22:36:59 -04:00
Andrei Betlen
c137789143
Add verbose flag. Closes #19
2023-04-04 13:09:24 -04:00
Andrei Betlen
5075c16fcc
Bugfix: n_batch should always be <= n_ctx
2023-04-04 13:08:21 -04:00
Andrei Betlen
caf3c0362b
Add return type for default __call__ method
2023-04-03 20:26:08 -04:00
Andrei Betlen
4aa349d777
Add docstring for create_chat_completion
2023-04-03 20:24:20 -04:00
Andrei Betlen
7fedf16531
Add support for chat completion
2023-04-03 20:12:44 -04:00
Andrei Betlen
3dec778c90
Update to more sensible return signature
2023-04-03 20:12:14 -04:00
Andrei Betlen
ae004eb69e
Fix #16
2023-04-03 18:46:19 -04:00
MillionthOdin16
a0758f0077
Update llama_cpp.py with PR requests
...
lib_base_name and load_shared_library
to
_lib_base_name and _load_shared_library
2023-04-03 13:06:50 -04:00
MillionthOdin16
a40476e299
Update llama_cpp.py
...
Make shared library code more robust with some platform specific functionality and more descriptive errors when failures occur
2023-04-02 21:50:13 -04:00
Andrei Betlen
1ed8cd023d
Update llama_cpp and add kv_cache api support
2023-04-02 13:33:49 -04:00
Andrei Betlen
4f509b963e
Bugfix: Stop sequences and missing max_tokens check
2023-04-02 03:59:19 -04:00
Andrei Betlen
353e18a781
Move workaround to new sample method
2023-04-02 00:06:34 -04:00
Andrei Betlen
a4a1bbeaa9
Update api to allow for easier interactive mode
2023-04-02 00:02:47 -04:00
Andrei Betlen
eef627c09c
Fix example documentation
2023-04-01 17:39:35 -04:00
Andrei Betlen
1e4346307c
Add documentation for generate method
2023-04-01 17:36:30 -04:00
Andrei Betlen
67c70cc8eb
Add static methods for beginning and end of sequence tokens.
2023-04-01 17:29:30 -04:00
Andrei Betlen
318eae237e
Update high-level api
2023-04-01 13:01:27 -04:00
Andrei Betlen
69e7d9f60e
Add type definitions
2023-04-01 12:59:58 -04:00
Andrei Betlen
49c8df369a
Fix type signature of token_to_str
2023-03-31 03:25:12 -04:00
Andrei Betlen
670d390001
Fix ctypes typing issue for Arrays
2023-03-31 03:20:15 -04:00