Commit graph

228 commits

Author SHA1 Message Date
Andrei Betlen
8dfde63255 Fix return type 2023-05-07 19:30:14 -04:00
Andrei Betlen
2753b85321 Format 2023-05-07 13:19:56 -04:00
Andrei Betlen
627811ea83 Add verbose flag to server 2023-05-07 05:09:10 -04:00
Andrei Betlen
3fbda71790 Fix mlock_supported and mmap_supported return type 2023-05-07 03:04:22 -04:00
Andrei Betlen
5a3413eee3 Update cpu_count 2023-05-07 03:03:57 -04:00
Andrei Betlen
1a00e452ea Update settings fields and defaults 2023-05-07 02:52:20 -04:00
Andrei Betlen
86753976c4 Revert "llama_cpp server: delete some ignored / unused parameters"
This reverts commit b47b9549d5.
2023-05-07 02:02:34 -04:00
Andrei Betlen
c382d8f86a Revert "llama_cpp server: mark model as required"
This reverts commit e40fcb0575.
2023-05-07 02:00:22 -04:00
Andrei Betlen
d8fddcce73 Merge branch 'main' of github.com:abetlen/llama_cpp_python into better-server-params-and-fields 2023-05-07 01:54:00 -04:00
Andrei Betlen
7c3743fe5f Update llama.cpp 2023-05-07 00:12:47 -04:00
Andrei Betlen
bc853e3742 Fix type for eval_logits in LlamaState object 2023-05-06 21:32:50 -04:00
Maximilian Winter
515d9bde7e Fixed somethings and activated cublas 2023-05-06 23:40:19 +02:00
Maximilian Winter
aa203a0d65 Added mirostat sampling to the high level API. 2023-05-06 22:47:47 +02:00
Andrei Betlen
98bbd1c6a8 Fix eval logits type 2023-05-05 14:23:14 -04:00
Andrei Betlen
b5f3e74627 Add return type annotations for embeddings and logits 2023-05-05 14:22:55 -04:00
Andrei Betlen
3e28e0e50c Fix: runtime type errors 2023-05-05 14:12:26 -04:00
Andrei Betlen
e24c3d7447 Prefer explicit imports 2023-05-05 14:05:31 -04:00
Andrei Betlen
40501435c1 Fix: types 2023-05-05 14:04:12 -04:00
Andrei Betlen
66e28eb548 Fix temperature bug 2023-05-05 14:00:41 -04:00
Andrei Betlen
6702d2abfd Fix candidates type 2023-05-05 14:00:30 -04:00
Andrei Betlen
5e7ddfc3d6 Fix llama_cpp types 2023-05-05 13:54:22 -04:00
Andrei Betlen
b6a9a0b6ba Add types for all low-level api functions 2023-05-05 12:22:27 -04:00
Andrei Betlen
5be0efa5f8 Cache should raise KeyError when key is missing 2023-05-05 12:21:49 -04:00
Andrei Betlen
24fc38754b Add cli options to server. Closes #37 2023-05-05 12:08:28 -04:00
Andrei Betlen
853dc711cc Format 2023-05-04 21:58:36 -04:00
Andrei Betlen
97c6372350 Rewind model to longest prefix. 2023-05-04 21:58:27 -04:00
Andrei Betlen
329297fafb Bugfix: Missing logits_to_logprobs 2023-05-04 12:18:40 -04:00
Lucas Doyle
3008a954c1 Merge branch 'main' of github.com:abetlen/llama-cpp-python into better-server-params-and-fields 2023-05-03 13:10:03 -07:00
Andrei Betlen
9e5b6d675a Improve logging messages 2023-05-03 10:28:10 -04:00
Andrei Betlen
43f2907e3a Support smaller state sizes 2023-05-03 09:33:50 -04:00
Andrei Betlen
1d47cce222 Update llama.cpp 2023-05-03 09:33:30 -04:00
Lucas Doyle
b9098b0ef7 llama_cpp server: prompt is a string
Not sure why this union type was here but taking a look at llama.py, prompt is only ever processed as a string for completion

This was breaking types when generating an openapi client
2023-05-02 14:47:07 -07:00
Matt Hoffner
f97ff3c5bb
Update llama_cpp.py 2023-05-01 20:40:06 -07:00
Andrei
7ab08b8d10
Merge branch 'main' into better-server-params-and-fields 2023-05-01 22:45:57 -04:00
Andrei Betlen
9eafc4c49a Refactor server to use factory 2023-05-01 22:38:46 -04:00
Andrei Betlen
dd9ad1c759 Formatting 2023-05-01 21:51:16 -04:00
Lucas Doyle
dbbfc4ba2f llama_cpp server: fix to ChatCompletionRequestMessage
When I generate a client, it breaks because it fails to process the schema of ChatCompletionRequestMessage

These fix that:
- I think `Union[Literal["user"], Literal["channel"], ...]` is the same as Literal["user", "channel", ...]
- Turns out default value `Literal["user"]` isn't JSON serializable, so replace with "user"
2023-05-01 15:38:19 -07:00
Lucas Doyle
fa2a61e065 llama_cpp server: fields for the embedding endpoint 2023-05-01 15:38:19 -07:00
Lucas Doyle
8dcbf65a45 llama_cpp server: define fields for chat completions
Slight refactor for common fields shared between completion and chat completion
2023-05-01 15:38:19 -07:00
Lucas Doyle
978b6daf93 llama_cpp server: add some more information to fields for completions 2023-05-01 15:38:19 -07:00
Lucas Doyle
a5aa6c1478 llama_cpp server: add missing top_k param to CreateChatCompletionRequest
`llama.create_chat_completion` definitely has a `top_k` argument, but its missing from `CreateChatCompletionRequest`. decision: add it
2023-05-01 15:38:19 -07:00
Lucas Doyle
1e42913599 llama_cpp server: move logprobs to supported
I think this is actually supported (its in the arguments of `LLama.__call__`, which is how the completion is invoked). decision: mark as supported
2023-05-01 15:38:19 -07:00
Lucas Doyle
b47b9549d5 llama_cpp server: delete some ignored / unused parameters
`n`, `presence_penalty`, `frequency_penalty`, `best_of`, `logit_bias`, `user`: not supported, excluded from the calls into llama. decision: delete it
2023-05-01 15:38:19 -07:00
Lucas Doyle
e40fcb0575 llama_cpp server: mark model as required
`model` is ignored, but currently marked "optional"... on the one hand could mark "required" to make it explicit in case the server supports multiple llama's at the same time, but also could delete it since its ignored. decision: mark it required for the sake of openai api compatibility.

I think out of all parameters, `model` is probably the most important one for people to keep using even if its ignored for now.
2023-05-01 15:38:19 -07:00
Andrei Betlen
b6747f722e Fix logprob calculation. Fixes #134 2023-05-01 17:45:08 -04:00
Andrei Betlen
9ff9cdd7fc Fix import error 2023-05-01 15:11:15 -04:00
Andrei Betlen
350a1769e1 Update sampling api 2023-05-01 14:47:55 -04:00
Andrei Betlen
7837c3fdc7 Fix return types and import comments 2023-05-01 14:02:06 -04:00
Andrei Betlen
ccf1ed54ae Merge branch 'main' of github.com:abetlen/llama_cpp_python into main 2023-05-01 11:35:14 -04:00
Andrei Betlen
80184a286c Update llama.cpp 2023-05-01 10:44:28 -04:00
Lucas Doyle
efe8e6f879 llama_cpp server: slight refactor to init_llama function
Define an init_llama function that starts llama with supplied settings instead of just doing it in the global context of app.py

This allows the test to be less brittle by not needing to mess with os.environ, then importing the app
2023-04-29 11:42:23 -07:00
Lucas Doyle
6d8db9d017 tests: simple test for server module 2023-04-29 11:42:20 -07:00
Lucas Doyle
468377b0e2 llama_cpp server: app is now importable, still runnable as a module 2023-04-29 11:41:25 -07:00
Andrei
755f9fa455
Merge pull request #118 from SagsMug/main
Fix UnicodeDecodeError permanently
2023-04-29 07:19:01 -04:00
Mug
18a0c10032 Remove excessive errors="ignore" and add utf8 test 2023-04-29 12:19:22 +02:00
Andrei Betlen
ea0faabae1 Update llama.cpp 2023-04-28 15:32:43 -04:00
Mug
b7d14efc8b Python weirdness 2023-04-28 13:20:31 +02:00
Mug
eed61289b6 Dont detect off tokens, detect off detokenized utf8 2023-04-28 13:16:18 +02:00
Mug
3a98747026 One day, i'll fix off by 1 errors permanently too 2023-04-28 12:54:28 +02:00
Mug
c39547a986 Detect multi-byte responses and wait 2023-04-28 12:50:30 +02:00
Andrei Betlen
9339929f56 Update llama.cpp 2023-04-26 20:00:54 -04:00
Mug
5f81400fcb Also ignore errors on input prompts 2023-04-26 14:45:51 +02:00
Mug
be2c961bc9 Merge branch 'main' of https://github.com/abetlen/llama-cpp-python 2023-04-26 14:38:09 +02:00
Mug
c4a8491d42 Fix decode errors permanently 2023-04-26 14:37:06 +02:00
Andrei Betlen
cbd26fdcc1 Update llama.cpp 2023-04-25 19:03:41 -04:00
Andrei Betlen
3cab3ef4cb Update n_batch for server 2023-04-25 09:11:32 -04:00
Andrei Betlen
cc706fb944 Add ctx check and re-order __init__. Closes #112 2023-04-25 09:00:53 -04:00
Andrei Betlen
d484c5634e Bugfix: Check cache keys as prefix to prompt tokens 2023-04-24 22:18:54 -04:00
Andrei Betlen
cbe95bbb75 Add cache implementation using llama state 2023-04-24 19:54:41 -04:00
Andrei Betlen
2c359a28ff Merge branch 'main' of github.com:abetlen/llama_cpp_python into main 2023-04-24 17:51:27 -04:00
Andrei Betlen
197cf80601 Add save/load state api for Llama class 2023-04-24 17:51:25 -04:00
Andrei Betlen
86f8e5ad91 Refactor internal state for Llama class 2023-04-24 15:47:54 -04:00
Andrei
f37456133a
Merge pull request #108 from eiery/main
Update n_batch default to 512 to match upstream llama.cpp
2023-04-24 13:48:09 -04:00
Andrei Betlen
02cf881317 Update llama.cpp 2023-04-24 09:30:10 -04:00
eiery
aa12d8a81f
Update llama.py
update n_batch default to 512 to match upstream llama.cpp
2023-04-23 20:56:40 -04:00
Andrei Betlen
7230599593 Disable mmap when applying lora weights. Closes #107 2023-04-23 14:53:17 -04:00
Andrei Betlen
e99caedbbd Update llama.cpp 2023-04-22 19:50:28 -04:00
Andrei Betlen
1eb130a6b2 Update llama.cpp 2023-04-21 17:40:27 -04:00
Andrei Betlen
e4647c75ec Add use_mmap flag to server 2023-04-19 15:57:46 -04:00
Andrei Betlen
0df4d69c20 If lora base is not set avoid re-loading the model by passing NULL 2023-04-18 23:45:25 -04:00
Andrei Betlen
95c0dc134e Update type signature to allow for null pointer to be passed. 2023-04-18 23:44:46 -04:00
Andrei Betlen
453e517fd5 Add seperate lora_base path for applying LoRA to quantized models using original unquantized model weights. 2023-04-18 10:20:46 -04:00
Andrei Betlen
eb7f278cc6 Add lora_path parameter to Llama model 2023-04-18 01:43:44 -04:00
Andrei Betlen
35abf89552 Add bindings for LoRA adapters. Closes #88 2023-04-18 01:30:04 -04:00
Andrei Betlen
89856ef00d Bugfix: only eval new tokens 2023-04-15 17:32:53 -04:00
Andrei Betlen
92c077136d Add experimental cache 2023-04-15 12:03:09 -04:00
Andrei Betlen
a6372a7ae5 Update stop sequences for chat 2023-04-15 12:02:48 -04:00
Andrei Betlen
83b2be6dc4 Update chat parameters 2023-04-15 11:58:43 -04:00
Andrei Betlen
62087514c6 Update chat prompt 2023-04-15 11:58:19 -04:00
Andrei Betlen
02f9fb82fb Bugfix 2023-04-15 11:39:52 -04:00
Andrei Betlen
3cd67c7bd7 Add type annotations 2023-04-15 11:39:21 -04:00
Andrei Betlen
d7de0e8014 Bugfix 2023-04-15 00:08:04 -04:00
Andrei Betlen
e90e122f2a Use clear 2023-04-14 23:33:18 -04:00
Andrei Betlen
ac7068a469 Track generated tokens internally 2023-04-14 23:33:00 -04:00
Andrei Betlen
6e298d8fca Set kv cache size to f16 by default 2023-04-14 22:21:19 -04:00
Andrei Betlen
6c7cec0c65 Fix completion request 2023-04-14 10:01:15 -04:00
Andrei Betlen
6153baab2d Clean up logprobs implementation 2023-04-14 09:59:33 -04:00
Andrei Betlen
26cc4ee029 Fix signature for stop parameter 2023-04-14 09:59:08 -04:00
Andrei Betlen
6595ad84bf Add field to disable reseting between generations 2023-04-13 00:28:00 -04:00
Andrei Betlen
22fa5a621f Revert "Deprecate generate method"
This reverts commit 6cf5876538.
2023-04-13 00:19:55 -04:00