Andrei Betlen
3a29d65f45
Update llama.cpp
2023-08-26 23:36:24 -04:00
Andrei Betlen
5de8009706
Add copilot-codex completions endpoint for drop-in copilot usage
2023-08-25 17:49:14 -04:00
Andrei Betlen
ac47d55577
Merge branch 'main' into v0.2-wip
2023-08-25 15:45:22 -04:00
Andrei Betlen
ef23d1e545
Update llama.cpp
2023-08-25 14:35:53 -04:00
Andrei Betlen
48cf43b427
Use _with_model variants for tokenization
2023-08-25 13:43:16 -04:00
Andrei Betlen
8ac59465b9
Strip leading space when de-tokenizing.
2023-08-25 04:56:48 -04:00
Andrei Betlen
c2d1deaa8a
Update llama.cpp
2023-08-24 18:01:42 -04:00
Andrei Betlen
db982a861f
Fix
2023-08-24 01:01:12 -04:00
Andrei Betlen
4ed632c4b3
Remove deprecated params
2023-08-24 01:01:05 -04:00
Andrei Betlen
cf405f6764
Merge branch 'main' into v0.2-wip
2023-08-24 00:30:51 -04:00
Andrei Betlen
bbbf0f4fc4
Update llama.cpp
2023-08-24 00:17:00 -04:00
Andrei Betlen
e632c59fa0
Merge branch 'main' of github.com:abetlen/llama_cpp_python into main
2023-08-17 20:53:04 -04:00
c0sogi
a240aa6b25
Fix typos in llama_grammar
2023-08-17 21:00:44 +09:00
Andrei Betlen
620cd2fd69
Merge branch 'main' of github.com:abetlen/llama_cpp_python into main
2023-08-14 22:41:47 -04:00
Andrei Betlen
5788f1f2b2
Remove unnused import
2023-08-14 22:41:37 -04:00
Andrei
6dfb98117e
Merge pull request #600 from Vuizur/main
...
Add py.typed to conform with PEP 561
2023-08-14 22:40:41 -04:00
Andrei
b99e758045
Merge pull request #604 from aliencaocao/main-1
...
Add doc string for n_gpu_layers argument and make -1 offload all layers
2023-08-14 22:40:10 -04:00
Andrei Betlen
b345d60987
Update llama.cpp
2023-08-14 22:33:30 -04:00
Billy Cao
c471871d0b
make n_gpu_layers=-1 offload all layers
2023-08-13 11:21:28 +08:00
Billy Cao
d018c7b01d
Add doc string for n_gpu_layers argument
2023-08-12 18:41:47 +08:00
Hannes Krumbiegel
17dd7fa8e0
Add py.typed
2023-08-11 09:58:48 +02:00
MeouSker77
88184ed217
fix CJK output again
2023-08-09 22:04:35 +08:00
Andrei Betlen
66fb0345e8
Move grammar to function call argument
2023-08-08 15:08:54 -04:00
Andrei Betlen
1e844d3238
fix
2023-08-08 15:07:28 -04:00
Andrei Betlen
843b7ccd90
Merge branch 'main' into c0sogi/main
2023-08-08 14:43:02 -04:00
Andrei Betlen
d015bdb4f8
Add mul_mat_q option
2023-08-08 14:35:06 -04:00
Andrei Betlen
f6a7850e1a
Update llama.cpp
2023-08-08 14:30:58 -04:00
c0sogi
0d7d2031a9
prevent memory access error by llama_grammar_free
2023-08-07 17:02:33 +09:00
c0sogi
b07713cb9f
reset grammar for every generation
2023-08-07 15:16:25 +09:00
c0sogi
418aa83b01
Added grammar based sampling
2023-08-07 02:21:37 +09:00
c0sogi
ac188a21f3
Added low level grammar API
2023-08-05 14:43:35 +09:00
Andrei Betlen
ce57920e60
Suppress llama.cpp output when loading model.
2023-07-28 14:45:18 -04:00
Andrei Betlen
a9b9f0397c
Format
2023-07-28 01:53:08 -04:00
Andrei Betlen
abc538fcd5
fix: annoying bug where attribute exceptions were droining out file not found exceptions
2023-07-28 01:43:00 -04:00
Shouyi Wang
426dbfe3f4
Change tensor_split from array to pointer
2023-07-25 18:29:59 +10:00
Andrei Betlen
078902a6fe
Add llama_grammar_accept_token
2023-07-24 15:55:26 -04:00
Andrei Betlen
bf901773b0
Add llama_sample_grammar
2023-07-24 15:42:31 -04:00
Andrei Betlen
1b6997d69f
Convert constants to python types and allow python types in low-level api
2023-07-24 15:42:07 -04:00
Andrei Betlen
343480364f
Merge branch 'main' into v0.2-wip
2023-07-24 15:26:08 -04:00
Andrei Betlen
11dd2bf382
Add temporary rms_norm_eps parameter
2023-07-24 14:09:24 -04:00
Andrei Betlen
8cd64d4ac3
Add rms_eps_norm
2023-07-24 13:52:12 -04:00
bretello
0f09f10e8c
add support for llama2 70b
2023-07-24 19:38:24 +02:00
Andrei Betlen
77c9f496b0
Merge branch 'main' into v0.2-wip
2023-07-24 13:19:54 -04:00
Andrei Betlen
401309d11c
Revert "Merge pull request #521 from bretello/main"
...
This reverts commit 07f0f3a386
, reversing
changes made to d8a3ddbb1c
.
2023-07-24 13:11:10 -04:00
Andrei
07f0f3a386
Merge pull request #521 from bretello/main
...
raise exception when `llama_load_model_from_file` fails
2023-07-24 13:09:28 -04:00
Andrei Betlen
d8a3ddbb1c
Update llama.cpp
2023-07-24 13:08:06 -04:00
Andrei Betlen
985d559971
Update llama.cpp
2023-07-24 13:04:34 -04:00
bretello
8be7d67f7e
raise exception when llama_load_model_from_file
fails
2023-07-24 14:42:37 +02:00
Andrei Betlen
436036aa67
Merge branch 'main' into v0.2-wip
2023-07-21 12:42:38 -04:00
Andrei Betlen
b83728ad1e
Update llama.cpp
2023-07-21 12:33:27 -04:00
Andrei Betlen
0538ba1dab
Merge branch 'main' into v0.2-wip
2023-07-20 19:06:26 -04:00
Andrei Betlen
01435da740
Update llama.cpp
2023-07-20 18:54:25 -04:00
Andrei Betlen
28a111704b
Fix compatibility with older python versions
2023-07-20 18:52:10 -04:00
Andrei Betlen
d10ce62714
Revert ctypes argtype change
2023-07-20 18:51:53 -04:00
Andrei
365d9a4367
Merge pull request #481 from c0sogi/main
...
Added `RouteErrorHandler` for server
2023-07-20 17:41:42 -04:00
Vinicius
a8551477f5
Update llama_cpp.py - Fix c_char_p to Array[c_char_p] and c_float to Array[c_float]
2023-07-20 17:29:11 -03:00
Carlos Tejada
0756a2d3fb
Now the last token sent when stream=True
2023-07-19 22:47:14 -04:00
Andrei Betlen
0b121a7456
Format
2023-07-19 03:48:27 -04:00
Andrei Betlen
b43917c144
Add functions parameters
2023-07-19 03:48:20 -04:00
Andrei Betlen
19ba9d3845
Use numpy arrays for logits_processors and stopping_criteria. Closes #491
2023-07-18 19:27:41 -04:00
shutup
5ed8bf132f
expose RoPE param to server start
2023-07-18 16:34:36 +08:00
c0sogi
1551ba10bd
Added RouteErrorHandler
for server
2023-07-16 14:57:39 +09:00
Andrei Betlen
8ab098e49d
Re-order Llama class params
2023-07-15 15:35:08 -04:00
Andrei Betlen
e4f9db37db
Fix context_params struct layout
2023-07-15 15:34:55 -04:00
Andrei Betlen
f0797a6054
Merge branch main into custom_rope
2023-07-15 15:11:01 -04:00
randoentity
3f8f276f9f
Add bindings for custom_rope
2023-07-10 17:37:46 +02:00
Andrei Betlen
a86bfdf0a5
bugfix: truncate completion max_tokens to fit context length by default
2023-07-09 18:13:29 -04:00
Andrei Betlen
6f70cc4b7d
bugfix: pydantic settings missing / changed fields
2023-07-09 18:03:31 -04:00
Andrei
5d756de314
Merge branch 'main' into add_unlimited_max_tokens
2023-07-08 02:37:38 -04:00
Andrei
b8e0bed295
Merge pull request #453 from wu-qing-157/main
...
Fix incorrect token_logprobs (due to indexing after sorting)
2023-07-08 02:31:52 -04:00
Andrei Betlen
d6e6aad927
bugfix: fix compatibility bug with openai api on last token
2023-07-08 00:06:11 -04:00
Andrei Betlen
4f2b5d0b53
Format
2023-07-08 00:05:10 -04:00
Andrei Betlen
34c505edf2
perf: convert pointer to byref
2023-07-07 22:54:07 -04:00
Andrei Betlen
52753b77f5
Upgrade fastapi to 0.100.0 and pydantic v2
2023-07-07 21:38:46 -04:00
Andrei Betlen
11eae75211
perf: avoid allocating new buffers during sampling
2023-07-07 19:28:53 -04:00
Andrei Betlen
a14d8a9b3f
perf: assign to candidates data structure instead
2023-07-07 18:58:43 -04:00
wu-qing-157
9e61661518
fix indexing token_logprobs after sorting
2023-07-07 10:18:49 +00:00
Andrei Betlen
57d8ec3899
Add setting to control request interruption
2023-07-07 03:37:23 -04:00
Andrei Betlen
4c7cdcca00
Add interruptible streaming requests for llama-cpp-python server. Closes #183
2023-07-07 03:04:17 -04:00
Andrei Betlen
98ae4e58a3
Update llama.cpp
2023-07-06 17:57:56 -04:00
Andrei Betlen
b994296c75
Update llama.cpp
2023-07-05 01:00:14 -04:00
Andrei Betlen
c67f786360
Update llama.cpp
2023-06-29 01:08:15 -04:00
Andrei Betlen
e34f4414cf
Hotfix: logits_all bug
2023-06-29 00:57:27 -04:00
Andrei Betlen
a2ede37bd5
Load logits directly into scores buffer
2023-06-29 00:45:46 -04:00
Andrei Betlen
b95b0ffbeb
Use pre-allocated buffers to store input_ids and scores
2023-06-29 00:40:47 -04:00
Andrei Betlen
a5e059c053
Free model when llama is unloaded. Closes #434
2023-06-28 23:58:55 -04:00
Andrei Betlen
3379dc40a1
Merge branch 'main' of github.com:abetlen/llama_cpp_python into main
2023-06-26 08:50:48 -04:00
Andrei Betlen
952228407e
Update llama.cpp
2023-06-26 08:50:38 -04:00
Andrei Betlen
b4a3db3e54
Update type signature
2023-06-26 08:50:30 -04:00
Andrei
5eb4ebb041
Merge branch 'main' into fix-state-pickle
2023-06-26 08:45:02 -04:00
samfundev
d788fb49bf
Only concatenate after all batches are done
2023-06-24 15:51:46 -04:00
Andrei
877ca6d016
Merge branch 'main' into fix-state-pickle
2023-06-23 15:13:07 -04:00
Alexey
282698b6d3
server: pass seed param from command line to llama
2023-06-23 00:19:24 +04:00
Andrei Betlen
e37798777e
Update llama.cpp
2023-06-20 11:25:10 -04:00
Andrei Betlen
d410f12fae
Update docs. Closes #386
2023-06-17 13:38:48 -04:00
Andrei Betlen
9f528f4715
Merge branch 'main' of github.com:abetlen/llama_cpp_python into main
2023-06-17 13:37:17 -04:00
Andrei Betlen
d7153abcf8
Update llama.cpp
2023-06-16 23:11:14 -04:00
imaprogrammer
fd9f294b3a
Update llama.py: Added how many input tokens in ValueError exception
2023-06-16 14:11:57 +05:30
Andrei Betlen
1e20be6d0c
Add low_vram to server settings
2023-06-14 22:13:42 -04:00
Andrei Betlen
44b83cada5
Add low_vram parameter
2023-06-14 22:12:33 -04:00
Andrei Betlen
f7c5cfaf50
Format server options
2023-06-14 22:08:28 -04:00
Andrei Betlen
9c41a3e990
Merge branch 'main' of github.com:abetlen/llama_cpp_python into main
2023-06-14 21:50:43 -04:00
Andrei
f568baeef1
Merge pull request #351 from player1537-forks/th/add-logits-bias-parameter
...
Add support for `logit_bias` and `logit_bias_type` parameters
2023-06-14 21:49:56 -04:00
Andrei Betlen
f27393ab7e
Add additional verbose logs for cache
2023-06-14 21:46:48 -04:00
Andrei Betlen
4cefb70cd0
Merge branch 'main' of github.com:abetlen/llama_cpp_python into main
2023-06-14 21:40:19 -04:00
Andrei Betlen
715f98c591
Update llama.cpp
2023-06-14 21:40:13 -04:00
Okabintaro
10b0cb727b
fix: Make LLamaState pickable for disk cache
...
I fixed the issue by making the saved state a bytes object instead of the ctypes one which can't be pickled.
2023-06-13 12:03:31 +02:00
Gabor
3129a0e7e5
correction to add back environment variable support <3 docker
2023-06-11 01:11:24 +01:00
Gabor
3ea31930e5
fixes abetlen/llama-cpp-python #358
2023-06-11 00:58:08 +01:00
Andrei Betlen
21acd7901f
Re-enable cache
2023-06-10 12:22:31 -04:00
Andrei Betlen
6639371407
Update llama.cpp
2023-06-10 12:17:38 -04:00
Tanner Hobson
eb7645b3ba
Add support for logit_bias and logit_bias_type parameters
2023-06-09 13:13:08 -04:00
Andrei Betlen
0da655b3be
Temporarily disable cache until save state bug is fixed.
2023-06-09 11:10:24 -04:00
Andrei Betlen
556c7edf47
Truncate max_tokens if it exceeds context length
2023-06-09 10:57:36 -04:00
Andrei Betlen
0c42168508
Fix cache implementation breaking changes
2023-06-08 13:19:23 -04:00
Andrei Betlen
607d217caa
Allow both .so and .dylib extensions for macos
2023-06-08 00:27:19 -04:00
Andrei
0f0b447fa4
Merge pull request #289 from Maximilian-Winter/main
...
Diskcache implementation for llama state.
2023-06-06 17:03:03 -04:00
Andrei
d508573fb4
Merge pull request #328 from spirilis/mirostat
...
Added mirostat support for completions, chat completions API
2023-06-06 16:58:23 -04:00
Andrei Betlen
aad4b17f52
Update llama.cpp
2023-06-06 16:23:55 -04:00
Andrei Betlen
8b4968ea67
Fix resize issue. Closes #330
2023-06-06 11:37:57 -04:00
Eric B
9b1c9e902c
Added mirostat support for completions, chat completions API
2023-06-05 22:37:11 -04:00
Andrei Betlen
7b57420ea9
Update llama.cpp
2023-06-05 18:17:29 -04:00
Maximilian-Winter
29f9c9cca3
Added both LlamaChache classes Disk and RAM.
2023-05-31 22:33:56 +02:00
Maximilian Winter
9ea7a379d3
Merge branch 'abetlen:main' into main
2023-05-31 12:55:51 +02:00
Andrei
49fe9395a1
Merge pull request #277 from abetlen/add-numpy-support
...
Use numpy for internal buffers
2023-05-29 20:59:30 -04:00
Maximilian-Winter
719c3eae0a
Diskcache implementation for llama state.
2023-05-28 15:56:38 +02:00
Andrei Betlen
80066f0b80
Use async routes
2023-05-27 09:12:58 -04:00
Andrei Betlen
c2b59a5f59
Import unnused import
2023-05-26 22:59:29 -04:00
Andrei Betlen
8f2b4456ad
Format
2023-05-26 22:04:31 -04:00
Andrei Betlen
84e313bd6e
Align dtype to match c structs
2023-05-26 22:02:16 -04:00
Andrei Betlen
66bcb8d70d
Merge branch 'main' into add-numpy-support
2023-05-26 20:25:03 -04:00
Andrei Betlen
8f35bddd7e
Fix stop sequence performance bug.
2023-05-26 20:23:49 -04:00
Andrei Betlen
7fc7bc30e7
Remove usage of eval_tokens for cache check
2023-05-26 20:12:05 -04:00
Andrei Betlen
fe331ec589
Replace eval_logits and eval_tokens with numpy arrays
2023-05-26 20:03:31 -04:00
Andrei Betlen
8eb9769f78
Add support for numpy
2023-05-26 16:12:45 -04:00
Andrei Betlen
4c1b7f7a76
Bugfix for logits_processor and stopping_criteria
2023-05-26 10:25:28 -04:00
Andrei Betlen
433a2e3e8a
Add extra logits_processor and stopping_criteria
2023-05-26 03:13:24 -04:00
Andrei Betlen
f74b90ed67
Fix streaming hang on last token when cache is on.
2023-05-26 03:03:01 -04:00
Andrei Betlen
5be8354e11
Added tokenizer
2023-05-26 03:00:51 -04:00
Andrei Betlen
8fa2ef1959
Format
2023-05-26 03:00:35 -04:00
Andrei Betlen
6bd1075291
Merge branch 'Maximilian-Winter/main' into main
2023-05-26 02:56:11 -04:00
Andrei Betlen
ca01f98e09
Add LlamaTokenizer class
2023-05-25 14:11:33 -04:00
Andrei Betlen
1d247e0f35
Add StoppingCriteria and LogitsProcessor to generate to match huggingface API
2023-05-25 14:04:54 -04:00
Maximilian-Winter
c2585b6889
Fixed list elements typing
2023-05-25 10:54:08 +02:00
Maximilian-Winter
da463e6c8c
Added types to logit processor list and stop criteria list
2023-05-25 09:07:16 +02:00
Maximilian-Winter
c05fcdf42f
Fixed none value of logits processors.
2023-05-24 22:02:06 +02:00
Maximilian-Winter
5bb780d455
Implemented logit processors and stop criteria's
2023-05-24 21:55:44 +02:00
Andrei Betlen
fab064ded9
Remove unnecessary ffi calls
2023-05-23 17:56:21 -04:00
Andrei Betlen
0adb9ec37a
Use model_name and index in response
2023-05-21 21:30:03 -04:00
Andrei Betlen
922b5b2bfd
Merge branch 'main' into server-embedding
2023-05-21 21:21:38 -04:00
Andrei Betlen
cd102e9da1
Cache shared library function calls for static tokens
2023-05-21 19:18:56 -04:00
Andrei Betlen
b895511cca
Fix penalize_nl
2023-05-21 18:38:06 -04:00
Andrei Betlen
03e2947b03
Fix unnecessary memory allocation while sampling
2023-05-21 18:36:34 -04:00
Andrei Betlen
fafe47114c
Update llama.cpp
2023-05-21 17:47:21 -04:00
Andrei Betlen
76b1d2cd20
Change properties to functions to match token functions
2023-05-20 08:24:06 -04:00
Andrei Betlen
a7ba85834f
Add n_ctx, n_vocab, and n_embd properties
2023-05-20 08:13:41 -04:00
Simon Chabot
e783f1c191
feat: make embedding support list of string as input
...
makes the /v1/embedding route similar to OpenAI api.
2023-05-20 01:23:32 +02:00
Andrei Betlen
01a010be52
Fix llama_cpp and Llama type signatures. Closes #221
2023-05-19 11:59:33 -04:00
Andrei Betlen
a8cd169251
Bugfix: Stop sequences can be strings
2023-05-19 03:15:08 -04:00
Andrei Betlen
17d4271b04
Fix logprobs for completions and implement for streaming logprobs.
2023-05-19 02:20:27 -04:00
Andrei Betlen
a634a2453b
Allow first logprob token to be null to match openai api
2023-05-19 02:04:57 -04:00
Andrei Betlen
dc39cc0fa4
Use server sent events function for streaming completion
2023-05-19 02:04:30 -04:00
Andrei Betlen
f0ec6e615e
Stream tokens instead of text chunks
2023-05-18 11:35:59 -04:00
Andrei Betlen
21d8f5fa9f
Remove unnused union
2023-05-18 11:35:15 -04:00
Andrei Betlen
61d58e7b35
Check for CUDA_PATH before adding
2023-05-17 15:26:38 -04:00
Andrei Betlen
7c95895626
Merge branch 'main' of github.com:abetlen/llama_cpp_python into main
2023-05-17 15:19:32 -04:00
Aneesh Joy
e9794f91f2
Fixd CUBLAS dll load issue in Windows
2023-05-17 18:04:58 +01:00
Andrei Betlen
4f342795e5
Update token checks
2023-05-17 03:35:13 -04:00
Andrei Betlen
f5c2f998ab
Format
2023-05-17 02:00:39 -04:00
Andrei Betlen
d28b753ed2
Implement penalize_nl
2023-05-17 01:53:26 -04:00
Andrei Betlen
f11e2a781c
Fix last_n_tokens_size
2023-05-17 01:42:51 -04:00
Andrei Betlen
7e55244540
Fix top_k value. Closes #220
2023-05-17 01:41:42 -04:00
Andrei Betlen
a7c9e38287
Update variable name
2023-05-16 18:07:25 -04:00
Andrei Betlen
a3352923c7
Add model_alias option to override model_path in completions. Closes #39
2023-05-16 17:22:00 -04:00
Andrei Betlen
a65125c0bd
Add sampling defaults for generate
2023-05-16 09:35:50 -04:00
Andrei Betlen
cbac19bf24
Add winmode arg only on windows if python version supports it
2023-05-15 09:15:01 -04:00
Andrei Betlen
c804efe3f0
Fix obscure Wndows DLL issue. Closes #208
2023-05-14 22:08:11 -04:00
Andrei Betlen
cdf59768f5
Update llama.cpp
2023-05-14 00:04:22 -04:00
Andrei Betlen
7a536e86c2
Allow model to tokenize strings longer than context length and set add_bos. Closes #92
2023-05-12 14:28:22 -04:00
Andrei Betlen
8740ddc58e
Only support generating one prompt at a time.
2023-05-12 07:21:46 -04:00
Andrei Betlen
8895b9002a
Revert "llama_cpp server: prompt is a string". Closes #187
...
This reverts commit b9098b0ef7
.
2023-05-12 07:16:57 -04:00
Andrei Betlen
7be584fe82
Add missing tfs_z paramter
2023-05-11 21:56:19 -04:00
Andrei Betlen
cdeaded251
Bugfix: Ensure logs are printed when streaming
2023-05-10 16:12:17 -04:00
Lucas Doyle
02e8a018ae
llama_cpp server: document presence_penalty and frequency_penalty, mark as supported
2023-05-09 16:25:00 -07:00
Andrei Betlen
d957422bf4
Implement sampling as in llama.cpp main example
2023-05-08 21:21:25 -04:00
Andrei Betlen
93a9019bb1
Merge branch 'main' of github.com:abetlen/llama_cpp_python into Maximilian-Winter/main
2023-05-08 19:57:09 -04:00
Andrei Betlen
82d138fe54
Fix: default repeat_penalty
2023-05-08 18:49:11 -04:00
Andrei Betlen
29f094bbcf
Bugfix: not falling back to environment variables when default is value is set.
2023-05-08 14:46:25 -04:00
Andrei Betlen
0d6c60097a
Show default value when --help is called
2023-05-08 14:21:15 -04:00
Andrei Betlen
022e9ebcb8
Use environment variable if parsed cli arg is None
2023-05-08 14:20:53 -04:00
Andrei Betlen
0d751a69a7
Set repeat_penalty to 0 by default
2023-05-08 01:50:43 -04:00
Andrei Betlen
65d9cc050c
Add openai frequency and presence penalty parameters. Closes #169
2023-05-08 01:30:18 -04:00
Andrei Betlen
a0b61ea2a7
Bugfix for models endpoint
2023-05-07 20:17:52 -04:00
Andrei Betlen
e72f58614b
Change pointer to lower overhead byref
2023-05-07 20:01:34 -04:00
Andrei Betlen
14da46f16e
Added cache size to settins object.
2023-05-07 19:33:17 -04:00
Andrei Betlen
0e94a70de1
Add in-memory longest prefix cache. Closes #158
2023-05-07 19:31:26 -04:00
Andrei Betlen
8dfde63255
Fix return type
2023-05-07 19:30:14 -04:00
Andrei Betlen
2753b85321
Format
2023-05-07 13:19:56 -04:00
Andrei Betlen
627811ea83
Add verbose flag to server
2023-05-07 05:09:10 -04:00
Andrei Betlen
3fbda71790
Fix mlock_supported and mmap_supported return type
2023-05-07 03:04:22 -04:00
Andrei Betlen
5a3413eee3
Update cpu_count
2023-05-07 03:03:57 -04:00
Andrei Betlen
1a00e452ea
Update settings fields and defaults
2023-05-07 02:52:20 -04:00
Andrei Betlen
86753976c4
Revert "llama_cpp server: delete some ignored / unused parameters"
...
This reverts commit b47b9549d5
.
2023-05-07 02:02:34 -04:00
Andrei Betlen
c382d8f86a
Revert "llama_cpp server: mark model as required"
...
This reverts commit e40fcb0575
.
2023-05-07 02:00:22 -04:00
Andrei Betlen
d8fddcce73
Merge branch 'main' of github.com:abetlen/llama_cpp_python into better-server-params-and-fields
2023-05-07 01:54:00 -04:00
Andrei Betlen
7c3743fe5f
Update llama.cpp
2023-05-07 00:12:47 -04:00
Andrei Betlen
bc853e3742
Fix type for eval_logits in LlamaState object
2023-05-06 21:32:50 -04:00
Maximilian Winter
515d9bde7e
Fixed somethings and activated cublas
2023-05-06 23:40:19 +02:00
Maximilian Winter
aa203a0d65
Added mirostat sampling to the high level API.
2023-05-06 22:47:47 +02:00
Andrei Betlen
98bbd1c6a8
Fix eval logits type
2023-05-05 14:23:14 -04:00
Andrei Betlen
b5f3e74627
Add return type annotations for embeddings and logits
2023-05-05 14:22:55 -04:00
Andrei Betlen
3e28e0e50c
Fix: runtime type errors
2023-05-05 14:12:26 -04:00
Andrei Betlen
e24c3d7447
Prefer explicit imports
2023-05-05 14:05:31 -04:00
Andrei Betlen
40501435c1
Fix: types
2023-05-05 14:04:12 -04:00
Andrei Betlen
66e28eb548
Fix temperature bug
2023-05-05 14:00:41 -04:00
Andrei Betlen
6702d2abfd
Fix candidates type
2023-05-05 14:00:30 -04:00
Andrei Betlen
5e7ddfc3d6
Fix llama_cpp types
2023-05-05 13:54:22 -04:00
Andrei Betlen
b6a9a0b6ba
Add types for all low-level api functions
2023-05-05 12:22:27 -04:00
Andrei Betlen
5be0efa5f8
Cache should raise KeyError when key is missing
2023-05-05 12:21:49 -04:00
Andrei Betlen
24fc38754b
Add cli options to server. Closes #37
2023-05-05 12:08:28 -04:00
Andrei Betlen
853dc711cc
Format
2023-05-04 21:58:36 -04:00
Andrei Betlen
97c6372350
Rewind model to longest prefix.
2023-05-04 21:58:27 -04:00
Andrei Betlen
329297fafb
Bugfix: Missing logits_to_logprobs
2023-05-04 12:18:40 -04:00
Lucas Doyle
3008a954c1
Merge branch 'main' of github.com:abetlen/llama-cpp-python into better-server-params-and-fields
2023-05-03 13:10:03 -07:00
Andrei Betlen
9e5b6d675a
Improve logging messages
2023-05-03 10:28:10 -04:00
Andrei Betlen
43f2907e3a
Support smaller state sizes
2023-05-03 09:33:50 -04:00
Andrei Betlen
1d47cce222
Update llama.cpp
2023-05-03 09:33:30 -04:00
Lucas Doyle
b9098b0ef7
llama_cpp server: prompt is a string
...
Not sure why this union type was here but taking a look at llama.py, prompt is only ever processed as a string for completion
This was breaking types when generating an openapi client
2023-05-02 14:47:07 -07:00
Matt Hoffner
f97ff3c5bb
Update llama_cpp.py
2023-05-01 20:40:06 -07:00
Andrei
7ab08b8d10
Merge branch 'main' into better-server-params-and-fields
2023-05-01 22:45:57 -04:00
Andrei Betlen
9eafc4c49a
Refactor server to use factory
2023-05-01 22:38:46 -04:00
Andrei Betlen
dd9ad1c759
Formatting
2023-05-01 21:51:16 -04:00
Lucas Doyle
dbbfc4ba2f
llama_cpp server: fix to ChatCompletionRequestMessage
...
When I generate a client, it breaks because it fails to process the schema of ChatCompletionRequestMessage
These fix that:
- I think `Union[Literal["user"], Literal["channel"], ...]` is the same as Literal["user", "channel", ...]
- Turns out default value `Literal["user"]` isn't JSON serializable, so replace with "user"
2023-05-01 15:38:19 -07:00
Lucas Doyle
fa2a61e065
llama_cpp server: fields for the embedding endpoint
2023-05-01 15:38:19 -07:00
Lucas Doyle
8dcbf65a45
llama_cpp server: define fields for chat completions
...
Slight refactor for common fields shared between completion and chat completion
2023-05-01 15:38:19 -07:00
Lucas Doyle
978b6daf93
llama_cpp server: add some more information to fields for completions
2023-05-01 15:38:19 -07:00
Lucas Doyle
a5aa6c1478
llama_cpp server: add missing top_k param to CreateChatCompletionRequest
...
`llama.create_chat_completion` definitely has a `top_k` argument, but its missing from `CreateChatCompletionRequest`. decision: add it
2023-05-01 15:38:19 -07:00
Lucas Doyle
1e42913599
llama_cpp server: move logprobs to supported
...
I think this is actually supported (its in the arguments of `LLama.__call__`, which is how the completion is invoked). decision: mark as supported
2023-05-01 15:38:19 -07:00
Lucas Doyle
b47b9549d5
llama_cpp server: delete some ignored / unused parameters
...
`n`, `presence_penalty`, `frequency_penalty`, `best_of`, `logit_bias`, `user`: not supported, excluded from the calls into llama. decision: delete it
2023-05-01 15:38:19 -07:00
Lucas Doyle
e40fcb0575
llama_cpp server: mark model as required
...
`model` is ignored, but currently marked "optional"... on the one hand could mark "required" to make it explicit in case the server supports multiple llama's at the same time, but also could delete it since its ignored. decision: mark it required for the sake of openai api compatibility.
I think out of all parameters, `model` is probably the most important one for people to keep using even if its ignored for now.
2023-05-01 15:38:19 -07:00
Andrei Betlen
b6747f722e
Fix logprob calculation. Fixes #134
2023-05-01 17:45:08 -04:00
Andrei Betlen
9ff9cdd7fc
Fix import error
2023-05-01 15:11:15 -04:00
Andrei Betlen
350a1769e1
Update sampling api
2023-05-01 14:47:55 -04:00
Andrei Betlen
7837c3fdc7
Fix return types and import comments
2023-05-01 14:02:06 -04:00
Andrei Betlen
ccf1ed54ae
Merge branch 'main' of github.com:abetlen/llama_cpp_python into main
2023-05-01 11:35:14 -04:00
Andrei Betlen
80184a286c
Update llama.cpp
2023-05-01 10:44:28 -04:00
Lucas Doyle
efe8e6f879
llama_cpp server: slight refactor to init_llama function
...
Define an init_llama function that starts llama with supplied settings instead of just doing it in the global context of app.py
This allows the test to be less brittle by not needing to mess with os.environ, then importing the app
2023-04-29 11:42:23 -07:00
Lucas Doyle
6d8db9d017
tests: simple test for server module
2023-04-29 11:42:20 -07:00
Lucas Doyle
468377b0e2
llama_cpp server: app is now importable, still runnable as a module
2023-04-29 11:41:25 -07:00
Andrei
755f9fa455
Merge pull request #118 from SagsMug/main
...
Fix UnicodeDecodeError permanently
2023-04-29 07:19:01 -04:00
Mug
18a0c10032
Remove excessive errors="ignore" and add utf8 test
2023-04-29 12:19:22 +02:00
Andrei Betlen
ea0faabae1
Update llama.cpp
2023-04-28 15:32:43 -04:00
Mug
b7d14efc8b
Python weirdness
2023-04-28 13:20:31 +02:00
Mug
eed61289b6
Dont detect off tokens, detect off detokenized utf8
2023-04-28 13:16:18 +02:00
Mug
3a98747026
One day, i'll fix off by 1 errors permanently too
2023-04-28 12:54:28 +02:00
Mug
c39547a986
Detect multi-byte responses and wait
2023-04-28 12:50:30 +02:00
Andrei Betlen
9339929f56
Update llama.cpp
2023-04-26 20:00:54 -04:00
Mug
5f81400fcb
Also ignore errors on input prompts
2023-04-26 14:45:51 +02:00
Mug
be2c961bc9
Merge branch 'main' of https://github.com/abetlen/llama-cpp-python
2023-04-26 14:38:09 +02:00
Mug
c4a8491d42
Fix decode errors permanently
2023-04-26 14:37:06 +02:00
Andrei Betlen
cbd26fdcc1
Update llama.cpp
2023-04-25 19:03:41 -04:00
Andrei Betlen
3cab3ef4cb
Update n_batch for server
2023-04-25 09:11:32 -04:00
Andrei Betlen
cc706fb944
Add ctx check and re-order __init__. Closes #112
2023-04-25 09:00:53 -04:00
Andrei Betlen
d484c5634e
Bugfix: Check cache keys as prefix to prompt tokens
2023-04-24 22:18:54 -04:00
Andrei Betlen
cbe95bbb75
Add cache implementation using llama state
2023-04-24 19:54:41 -04:00
Andrei Betlen
2c359a28ff
Merge branch 'main' of github.com:abetlen/llama_cpp_python into main
2023-04-24 17:51:27 -04:00
Andrei Betlen
197cf80601
Add save/load state api for Llama class
2023-04-24 17:51:25 -04:00
Andrei Betlen
86f8e5ad91
Refactor internal state for Llama class
2023-04-24 15:47:54 -04:00
Andrei
f37456133a
Merge pull request #108 from eiery/main
...
Update n_batch default to 512 to match upstream llama.cpp
2023-04-24 13:48:09 -04:00
Andrei Betlen
02cf881317
Update llama.cpp
2023-04-24 09:30:10 -04:00
eiery
aa12d8a81f
Update llama.py
...
update n_batch default to 512 to match upstream llama.cpp
2023-04-23 20:56:40 -04:00
Andrei Betlen
7230599593
Disable mmap when applying lora weights. Closes #107
2023-04-23 14:53:17 -04:00
Andrei Betlen
e99caedbbd
Update llama.cpp
2023-04-22 19:50:28 -04:00
Andrei Betlen
1eb130a6b2
Update llama.cpp
2023-04-21 17:40:27 -04:00
Andrei Betlen
e4647c75ec
Add use_mmap flag to server
2023-04-19 15:57:46 -04:00
Andrei Betlen
0df4d69c20
If lora base is not set avoid re-loading the model by passing NULL
2023-04-18 23:45:25 -04:00
Andrei Betlen
95c0dc134e
Update type signature to allow for null pointer to be passed.
2023-04-18 23:44:46 -04:00
Andrei Betlen
453e517fd5
Add seperate lora_base path for applying LoRA to quantized models using original unquantized model weights.
2023-04-18 10:20:46 -04:00
Andrei Betlen
eb7f278cc6
Add lora_path parameter to Llama model
2023-04-18 01:43:44 -04:00
Andrei Betlen
35abf89552
Add bindings for LoRA adapters. Closes #88
2023-04-18 01:30:04 -04:00
Andrei Betlen
89856ef00d
Bugfix: only eval new tokens
2023-04-15 17:32:53 -04:00
Andrei Betlen
92c077136d
Add experimental cache
2023-04-15 12:03:09 -04:00
Andrei Betlen
a6372a7ae5
Update stop sequences for chat
2023-04-15 12:02:48 -04:00
Andrei Betlen
83b2be6dc4
Update chat parameters
2023-04-15 11:58:43 -04:00
Andrei Betlen
62087514c6
Update chat prompt
2023-04-15 11:58:19 -04:00
Andrei Betlen
02f9fb82fb
Bugfix
2023-04-15 11:39:52 -04:00
Andrei Betlen
3cd67c7bd7
Add type annotations
2023-04-15 11:39:21 -04:00
Andrei Betlen
d7de0e8014
Bugfix
2023-04-15 00:08:04 -04:00
Andrei Betlen
e90e122f2a
Use clear
2023-04-14 23:33:18 -04:00
Andrei Betlen
ac7068a469
Track generated tokens internally
2023-04-14 23:33:00 -04:00
Andrei Betlen
6e298d8fca
Set kv cache size to f16 by default
2023-04-14 22:21:19 -04:00
Andrei Betlen
6c7cec0c65
Fix completion request
2023-04-14 10:01:15 -04:00
Andrei Betlen
6153baab2d
Clean up logprobs implementation
2023-04-14 09:59:33 -04:00
Andrei Betlen
26cc4ee029
Fix signature for stop parameter
2023-04-14 09:59:08 -04:00
Andrei Betlen
6595ad84bf
Add field to disable reseting between generations
2023-04-13 00:28:00 -04:00
Andrei Betlen
22fa5a621f
Revert "Deprecate generate method"
...
This reverts commit 6cf5876538
.
2023-04-13 00:19:55 -04:00
Andrei Betlen
4f5f99ef2a
Formatting
2023-04-12 22:40:12 -04:00
Andrei Betlen
0daf16defc
Enable logprobs on completion endpoint
2023-04-12 19:08:11 -04:00
Andrei Betlen
19598ac4e8
Fix threading bug. Closes #62
2023-04-12 19:07:53 -04:00
Andrei Betlen
005c78d26c
Update llama.cpp
2023-04-12 14:29:00 -04:00