Lucas Doyle
|
468377b0e2
|
llama_cpp server: app is now importable, still runnable as a module
|
2023-04-29 11:41:25 -07:00 |
|
Andrei
|
755f9fa455
|
Merge pull request #118 from SagsMug/main
Fix UnicodeDecodeError permanently
|
2023-04-29 07:19:01 -04:00 |
|
Mug
|
18a0c10032
|
Remove excessive errors="ignore" and add utf8 test
|
2023-04-29 12:19:22 +02:00 |
|
Andrei Betlen
|
ea0faabae1
|
Update llama.cpp
|
2023-04-28 15:32:43 -04:00 |
|
Mug
|
b7d14efc8b
|
Python weirdness
|
2023-04-28 13:20:31 +02:00 |
|
Mug
|
eed61289b6
|
Dont detect off tokens, detect off detokenized utf8
|
2023-04-28 13:16:18 +02:00 |
|
Mug
|
3a98747026
|
One day, i'll fix off by 1 errors permanently too
|
2023-04-28 12:54:28 +02:00 |
|
Mug
|
c39547a986
|
Detect multi-byte responses and wait
|
2023-04-28 12:50:30 +02:00 |
|
Andrei Betlen
|
9339929f56
|
Update llama.cpp
|
2023-04-26 20:00:54 -04:00 |
|
Mug
|
5f81400fcb
|
Also ignore errors on input prompts
|
2023-04-26 14:45:51 +02:00 |
|
Mug
|
be2c961bc9
|
Merge branch 'main' of https://github.com/abetlen/llama-cpp-python
|
2023-04-26 14:38:09 +02:00 |
|
Mug
|
c4a8491d42
|
Fix decode errors permanently
|
2023-04-26 14:37:06 +02:00 |
|
Andrei Betlen
|
cbd26fdcc1
|
Update llama.cpp
|
2023-04-25 19:03:41 -04:00 |
|
Andrei Betlen
|
3cab3ef4cb
|
Update n_batch for server
|
2023-04-25 09:11:32 -04:00 |
|
Andrei Betlen
|
cc706fb944
|
Add ctx check and re-order __init__. Closes #112
|
2023-04-25 09:00:53 -04:00 |
|
Andrei Betlen
|
d484c5634e
|
Bugfix: Check cache keys as prefix to prompt tokens
|
2023-04-24 22:18:54 -04:00 |
|
Andrei Betlen
|
cbe95bbb75
|
Add cache implementation using llama state
|
2023-04-24 19:54:41 -04:00 |
|
Andrei Betlen
|
2c359a28ff
|
Merge branch 'main' of github.com:abetlen/llama_cpp_python into main
|
2023-04-24 17:51:27 -04:00 |
|
Andrei Betlen
|
197cf80601
|
Add save/load state api for Llama class
|
2023-04-24 17:51:25 -04:00 |
|
Andrei Betlen
|
86f8e5ad91
|
Refactor internal state for Llama class
|
2023-04-24 15:47:54 -04:00 |
|
Andrei
|
f37456133a
|
Merge pull request #108 from eiery/main
Update n_batch default to 512 to match upstream llama.cpp
|
2023-04-24 13:48:09 -04:00 |
|
Andrei Betlen
|
02cf881317
|
Update llama.cpp
|
2023-04-24 09:30:10 -04:00 |
|
eiery
|
aa12d8a81f
|
Update llama.py
update n_batch default to 512 to match upstream llama.cpp
|
2023-04-23 20:56:40 -04:00 |
|
Andrei Betlen
|
7230599593
|
Disable mmap when applying lora weights. Closes #107
|
2023-04-23 14:53:17 -04:00 |
|
Andrei Betlen
|
e99caedbbd
|
Update llama.cpp
|
2023-04-22 19:50:28 -04:00 |
|
Andrei Betlen
|
1eb130a6b2
|
Update llama.cpp
|
2023-04-21 17:40:27 -04:00 |
|
Andrei Betlen
|
e4647c75ec
|
Add use_mmap flag to server
|
2023-04-19 15:57:46 -04:00 |
|
Andrei Betlen
|
0df4d69c20
|
If lora base is not set avoid re-loading the model by passing NULL
|
2023-04-18 23:45:25 -04:00 |
|
Andrei Betlen
|
95c0dc134e
|
Update type signature to allow for null pointer to be passed.
|
2023-04-18 23:44:46 -04:00 |
|
Andrei Betlen
|
453e517fd5
|
Add seperate lora_base path for applying LoRA to quantized models using original unquantized model weights.
|
2023-04-18 10:20:46 -04:00 |
|
Andrei Betlen
|
eb7f278cc6
|
Add lora_path parameter to Llama model
|
2023-04-18 01:43:44 -04:00 |
|
Andrei Betlen
|
35abf89552
|
Add bindings for LoRA adapters. Closes #88
|
2023-04-18 01:30:04 -04:00 |
|
Andrei Betlen
|
89856ef00d
|
Bugfix: only eval new tokens
|
2023-04-15 17:32:53 -04:00 |
|
Andrei Betlen
|
92c077136d
|
Add experimental cache
|
2023-04-15 12:03:09 -04:00 |
|
Andrei Betlen
|
a6372a7ae5
|
Update stop sequences for chat
|
2023-04-15 12:02:48 -04:00 |
|
Andrei Betlen
|
83b2be6dc4
|
Update chat parameters
|
2023-04-15 11:58:43 -04:00 |
|
Andrei Betlen
|
62087514c6
|
Update chat prompt
|
2023-04-15 11:58:19 -04:00 |
|
Andrei Betlen
|
02f9fb82fb
|
Bugfix
|
2023-04-15 11:39:52 -04:00 |
|
Andrei Betlen
|
3cd67c7bd7
|
Add type annotations
|
2023-04-15 11:39:21 -04:00 |
|
Andrei Betlen
|
d7de0e8014
|
Bugfix
|
2023-04-15 00:08:04 -04:00 |
|
Andrei Betlen
|
e90e122f2a
|
Use clear
|
2023-04-14 23:33:18 -04:00 |
|
Andrei Betlen
|
ac7068a469
|
Track generated tokens internally
|
2023-04-14 23:33:00 -04:00 |
|
Andrei Betlen
|
6e298d8fca
|
Set kv cache size to f16 by default
|
2023-04-14 22:21:19 -04:00 |
|
Andrei Betlen
|
6c7cec0c65
|
Fix completion request
|
2023-04-14 10:01:15 -04:00 |
|
Andrei Betlen
|
6153baab2d
|
Clean up logprobs implementation
|
2023-04-14 09:59:33 -04:00 |
|
Andrei Betlen
|
26cc4ee029
|
Fix signature for stop parameter
|
2023-04-14 09:59:08 -04:00 |
|
Andrei Betlen
|
6595ad84bf
|
Add field to disable reseting between generations
|
2023-04-13 00:28:00 -04:00 |
|
Andrei Betlen
|
22fa5a621f
|
Revert "Deprecate generate method"
This reverts commit 6cf5876538 .
|
2023-04-13 00:19:55 -04:00 |
|
Andrei Betlen
|
4f5f99ef2a
|
Formatting
|
2023-04-12 22:40:12 -04:00 |
|
Andrei Betlen
|
0daf16defc
|
Enable logprobs on completion endpoint
|
2023-04-12 19:08:11 -04:00 |
|