Lucas Doyle
fa2a61e065
llama_cpp server: fields for the embedding endpoint
2023-05-01 15:38:19 -07:00
Lucas Doyle
8dcbf65a45
llama_cpp server: define fields for chat completions
...
Slight refactor for common fields shared between completion and chat completion
2023-05-01 15:38:19 -07:00
Lucas Doyle
978b6daf93
llama_cpp server: add some more information to fields for completions
2023-05-01 15:38:19 -07:00
Lucas Doyle
a5aa6c1478
llama_cpp server: add missing top_k param to CreateChatCompletionRequest
...
`llama.create_chat_completion` definitely has a `top_k` argument, but its missing from `CreateChatCompletionRequest`. decision: add it
2023-05-01 15:38:19 -07:00
Lucas Doyle
1e42913599
llama_cpp server: move logprobs to supported
...
I think this is actually supported (its in the arguments of `LLama.__call__`, which is how the completion is invoked). decision: mark as supported
2023-05-01 15:38:19 -07:00
Lucas Doyle
b47b9549d5
llama_cpp server: delete some ignored / unused parameters
...
`n`, `presence_penalty`, `frequency_penalty`, `best_of`, `logit_bias`, `user`: not supported, excluded from the calls into llama. decision: delete it
2023-05-01 15:38:19 -07:00
Lucas Doyle
e40fcb0575
llama_cpp server: mark model as required
...
`model` is ignored, but currently marked "optional"... on the one hand could mark "required" to make it explicit in case the server supports multiple llama's at the same time, but also could delete it since its ignored. decision: mark it required for the sake of openai api compatibility.
I think out of all parameters, `model` is probably the most important one for people to keep using even if its ignored for now.
2023-05-01 15:38:19 -07:00
Andrei Betlen
9ff9cdd7fc
Fix import error
2023-05-01 15:11:15 -04:00
Lucas Doyle
efe8e6f879
llama_cpp server: slight refactor to init_llama function
...
Define an init_llama function that starts llama with supplied settings instead of just doing it in the global context of app.py
This allows the test to be less brittle by not needing to mess with os.environ, then importing the app
2023-04-29 11:42:23 -07:00
Lucas Doyle
6d8db9d017
tests: simple test for server module
2023-04-29 11:42:20 -07:00
Lucas Doyle
468377b0e2
llama_cpp server: app is now importable, still runnable as a module
2023-04-29 11:41:25 -07:00
Andrei Betlen
3cab3ef4cb
Update n_batch for server
2023-04-25 09:11:32 -04:00
Andrei Betlen
e4647c75ec
Add use_mmap flag to server
2023-04-19 15:57:46 -04:00
Andrei Betlen
92c077136d
Add experimental cache
2023-04-15 12:03:09 -04:00
Andrei Betlen
6c7cec0c65
Fix completion request
2023-04-14 10:01:15 -04:00
Andrei Betlen
4f5f99ef2a
Formatting
2023-04-12 22:40:12 -04:00
Andrei Betlen
0daf16defc
Enable logprobs on completion endpoint
2023-04-12 19:08:11 -04:00
Andrei Betlen
19598ac4e8
Fix threading bug. Closes #62
2023-04-12 19:07:53 -04:00
Andrei Betlen
b3805bb9cc
Implement logprobs parameter for text completion. Closes #2
2023-04-12 14:05:11 -04:00
Andrei Betlen
213cc5c340
Remove async from function signature to avoid blocking the server
2023-04-11 11:54:31 -04:00
Andrei Betlen
0067c1a588
Formatting
2023-04-08 16:01:18 -04:00
Andrei Betlen
da539cc2ee
Safer calculation of default n_threads
2023-04-06 21:22:19 -04:00
Andrei Betlen
930db37dd2
Merge branch 'main' of github.com:abetlen/llama_cpp_python into main
2023-04-06 21:07:38 -04:00
Andrei Betlen
55279b679d
Handle prompt list
2023-04-06 21:07:35 -04:00
MillionthOdin16
c283edd7f2
Set n_batch to default values and reduce thread count:
...
Change batch size to the llama.cpp default of 8. I've seen issues in llama.cpp where batch size affects quality of generations. (It shouldn't) But in case that's still an issue I changed to default.
Set auto-determined num of threads to 1/2 system count. ggml will sometimes lock cores at 100% while doing nothing. This is being addressed, but can cause bad experience for user if pegged at 100%
2023-04-05 18:17:29 -04:00
MillionthOdin16
76a82babef
Set n_batch to the default value of 8. I think this is leftover from when n_ctx was missing and n_batch was 2048.
2023-04-05 17:44:53 -04:00
Andrei Betlen
44448fb3a8
Add server as a subpackage
2023-04-05 16:23:25 -04:00