baalajimaestro/llama.cpp

Author	SHA1	Message	Date
Andrei Betlen	9eafc4c49a	Refactor server to use factory	2023-05-01 22:38:46 -04:00
Andrei Betlen	9ff9cdd7fc	Fix import error	2023-05-01 15:11:15 -04:00
Lucas Doyle	efe8e6f879	llama_cpp server: slight refactor to init_llama function Define an init_llama function that starts llama with supplied settings instead of just doing it in the global context of app.py This allows the test to be less brittle by not needing to mess with os.environ, then importing the app	2023-04-29 11:42:23 -07:00
Lucas Doyle	6d8db9d017	tests: simple test for server module	2023-04-29 11:42:20 -07:00
Lucas Doyle	468377b0e2	llama_cpp server: app is now importable, still runnable as a module	2023-04-29 11:41:25 -07:00
Andrei Betlen	3cab3ef4cb	Update n_batch for server	2023-04-25 09:11:32 -04:00
Andrei Betlen	e4647c75ec	Add use_mmap flag to server	2023-04-19 15:57:46 -04:00
Andrei Betlen	92c077136d	Add experimental cache	2023-04-15 12:03:09 -04:00
Andrei Betlen	6c7cec0c65	Fix completion request	2023-04-14 10:01:15 -04:00
Andrei Betlen	4f5f99ef2a	Formatting	2023-04-12 22:40:12 -04:00
Andrei Betlen	0daf16defc	Enable logprobs on completion endpoint	2023-04-12 19:08:11 -04:00
Andrei Betlen	19598ac4e8	Fix threading bug. Closes #62	2023-04-12 19:07:53 -04:00
Andrei Betlen	b3805bb9cc	Implement logprobs parameter for text completion. Closes #2	2023-04-12 14:05:11 -04:00
Andrei Betlen	213cc5c340	Remove async from function signature to avoid blocking the server	2023-04-11 11:54:31 -04:00
Andrei Betlen	0067c1a588	Formatting	2023-04-08 16:01:18 -04:00
Andrei Betlen	da539cc2ee	Safer calculation of default n_threads	2023-04-06 21:22:19 -04:00
Andrei Betlen	930db37dd2	Merge branch 'main' of github.com:abetlen/llama_cpp_python into main	2023-04-06 21:07:38 -04:00
Andrei Betlen	55279b679d	Handle prompt list	2023-04-06 21:07:35 -04:00
MillionthOdin16	c283edd7f2	Set n_batch to default values and reduce thread count: Change batch size to the llama.cpp default of 8. I've seen issues in llama.cpp where batch size affects quality of generations. (It shouldn't) But in case that's still an issue I changed to default. Set auto-determined num of threads to 1/2 system count. ggml will sometimes lock cores at 100% while doing nothing. This is being addressed, but can cause bad experience for user if pegged at 100%	2023-04-05 18:17:29 -04:00
MillionthOdin16	76a82babef	Set n_batch to the default value of 8. I think this is leftover from when n_ctx was missing and n_batch was 2048.	2023-04-05 17:44:53 -04:00
Andrei Betlen	44448fb3a8	Add server as a subpackage	2023-04-05 16:23:25 -04:00

21 commits