Andrei Betlen
329297fafb
Bugfix: Missing logits_to_logprobs
2023-05-04 12:18:40 -04:00
Andrei Betlen
9e5b6d675a
Improve logging messages
2023-05-03 10:28:10 -04:00
Andrei Betlen
43f2907e3a
Support smaller state sizes
2023-05-03 09:33:50 -04:00
Andrei Betlen
1d47cce222
Update llama.cpp
2023-05-03 09:33:30 -04:00
Matt Hoffner
f97ff3c5bb
Update llama_cpp.py
2023-05-01 20:40:06 -07:00
Andrei Betlen
9eafc4c49a
Refactor server to use factory
2023-05-01 22:38:46 -04:00
Andrei Betlen
dd9ad1c759
Formatting
2023-05-01 21:51:16 -04:00
Andrei Betlen
b6747f722e
Fix logprob calculation. Fixes #134
2023-05-01 17:45:08 -04:00
Andrei Betlen
9ff9cdd7fc
Fix import error
2023-05-01 15:11:15 -04:00
Andrei Betlen
350a1769e1
Update sampling api
2023-05-01 14:47:55 -04:00
Andrei Betlen
7837c3fdc7
Fix return types and import comments
2023-05-01 14:02:06 -04:00
Andrei Betlen
ccf1ed54ae
Merge branch 'main' of github.com:abetlen/llama_cpp_python into main
2023-05-01 11:35:14 -04:00
Andrei Betlen
80184a286c
Update llama.cpp
2023-05-01 10:44:28 -04:00
Lucas Doyle
efe8e6f879
llama_cpp server: slight refactor to init_llama function
...
Define an init_llama function that starts llama with supplied settings instead of just doing it in the global context of app.py
This allows the test to be less brittle by not needing to mess with os.environ, then importing the app
2023-04-29 11:42:23 -07:00
Lucas Doyle
6d8db9d017
tests: simple test for server module
2023-04-29 11:42:20 -07:00
Lucas Doyle
468377b0e2
llama_cpp server: app is now importable, still runnable as a module
2023-04-29 11:41:25 -07:00
Andrei
755f9fa455
Merge pull request #118 from SagsMug/main
...
Fix UnicodeDecodeError permanently
2023-04-29 07:19:01 -04:00
Mug
18a0c10032
Remove excessive errors="ignore" and add utf8 test
2023-04-29 12:19:22 +02:00
Andrei Betlen
ea0faabae1
Update llama.cpp
2023-04-28 15:32:43 -04:00
Mug
b7d14efc8b
Python weirdness
2023-04-28 13:20:31 +02:00
Mug
eed61289b6
Dont detect off tokens, detect off detokenized utf8
2023-04-28 13:16:18 +02:00
Mug
3a98747026
One day, i'll fix off by 1 errors permanently too
2023-04-28 12:54:28 +02:00
Mug
c39547a986
Detect multi-byte responses and wait
2023-04-28 12:50:30 +02:00
Andrei Betlen
9339929f56
Update llama.cpp
2023-04-26 20:00:54 -04:00
Mug
5f81400fcb
Also ignore errors on input prompts
2023-04-26 14:45:51 +02:00
Mug
be2c961bc9
Merge branch 'main' of https://github.com/abetlen/llama-cpp-python
2023-04-26 14:38:09 +02:00
Mug
c4a8491d42
Fix decode errors permanently
2023-04-26 14:37:06 +02:00
Andrei Betlen
cbd26fdcc1
Update llama.cpp
2023-04-25 19:03:41 -04:00
Andrei Betlen
3cab3ef4cb
Update n_batch for server
2023-04-25 09:11:32 -04:00
Andrei Betlen
cc706fb944
Add ctx check and re-order __init__. Closes #112
2023-04-25 09:00:53 -04:00
Andrei Betlen
d484c5634e
Bugfix: Check cache keys as prefix to prompt tokens
2023-04-24 22:18:54 -04:00
Andrei Betlen
cbe95bbb75
Add cache implementation using llama state
2023-04-24 19:54:41 -04:00
Andrei Betlen
2c359a28ff
Merge branch 'main' of github.com:abetlen/llama_cpp_python into main
2023-04-24 17:51:27 -04:00
Andrei Betlen
197cf80601
Add save/load state api for Llama class
2023-04-24 17:51:25 -04:00
Andrei Betlen
86f8e5ad91
Refactor internal state for Llama class
2023-04-24 15:47:54 -04:00
Andrei
f37456133a
Merge pull request #108 from eiery/main
...
Update n_batch default to 512 to match upstream llama.cpp
2023-04-24 13:48:09 -04:00
Andrei Betlen
02cf881317
Update llama.cpp
2023-04-24 09:30:10 -04:00
eiery
aa12d8a81f
Update llama.py
...
update n_batch default to 512 to match upstream llama.cpp
2023-04-23 20:56:40 -04:00
Andrei Betlen
7230599593
Disable mmap when applying lora weights. Closes #107
2023-04-23 14:53:17 -04:00
Andrei Betlen
e99caedbbd
Update llama.cpp
2023-04-22 19:50:28 -04:00
Andrei Betlen
1eb130a6b2
Update llama.cpp
2023-04-21 17:40:27 -04:00
Andrei Betlen
e4647c75ec
Add use_mmap flag to server
2023-04-19 15:57:46 -04:00
Andrei Betlen
0df4d69c20
If lora base is not set avoid re-loading the model by passing NULL
2023-04-18 23:45:25 -04:00
Andrei Betlen
95c0dc134e
Update type signature to allow for null pointer to be passed.
2023-04-18 23:44:46 -04:00
Andrei Betlen
453e517fd5
Add seperate lora_base path for applying LoRA to quantized models using original unquantized model weights.
2023-04-18 10:20:46 -04:00
Andrei Betlen
eb7f278cc6
Add lora_path parameter to Llama model
2023-04-18 01:43:44 -04:00
Andrei Betlen
35abf89552
Add bindings for LoRA adapters. Closes #88
2023-04-18 01:30:04 -04:00
Andrei Betlen
89856ef00d
Bugfix: only eval new tokens
2023-04-15 17:32:53 -04:00
Andrei Betlen
92c077136d
Add experimental cache
2023-04-15 12:03:09 -04:00
Andrei Betlen
a6372a7ae5
Update stop sequences for chat
2023-04-15 12:02:48 -04:00