baalajimaestro/llama.cpp

Author	SHA1	Message	Date
Shouyi Wang	9f21f548a5	Add tensor split	2023-07-09 23:00:59 +10:00
Andrei Betlen	52753b77f5	Upgrade fastapi to 0.100.0 and pydantic v2	2023-07-07 21:38:46 -04:00
Andrei Betlen	57d8ec3899	Add setting to control request interruption	2023-07-07 03:37:23 -04:00
Andrei Betlen	4c7cdcca00	Add interruptible streaming requests for llama-cpp-python server. Closes #183	2023-07-07 03:04:17 -04:00
Alexey	282698b6d3	server: pass seed param from command line to llama	2023-06-23 00:19:24 +04:00
Andrei Betlen	1e20be6d0c	Add low_vram to server settings	2023-06-14 22:13:42 -04:00
Andrei Betlen	f7c5cfaf50	Format server options	2023-06-14 22:08:28 -04:00
Andrei Betlen	9c41a3e990	Merge branch 'main' of github.com:abetlen/llama_cpp_python into main	2023-06-14 21:50:43 -04:00
Andrei	f568baeef1	Merge pull request #351 from player1537-forks/th/add-logits-bias-parameter Add support for `logit_bias` and `logit_bias_type` parameters	2023-06-14 21:49:56 -04:00
Andrei Betlen	f27393ab7e	Add additional verbose logs for cache	2023-06-14 21:46:48 -04:00
Gabor	3ea31930e5	fixes abetlen/llama-cpp-python #358	2023-06-11 00:58:08 +01:00
Tanner Hobson	eb7645b3ba	Add support for logit_bias and logit_bias_type parameters	2023-06-09 13:13:08 -04:00
Andrei Betlen	0c42168508	Fix cache implementation breaking changes	2023-06-08 13:19:23 -04:00
Eric B	9b1c9e902c	Added mirostat support for completions, chat completions API	2023-06-05 22:37:11 -04:00
Andrei Betlen	80066f0b80	Use async routes	2023-05-27 09:12:58 -04:00
Andrei Betlen	c2b59a5f59	Import unnused import	2023-05-26 22:59:29 -04:00
Simon Chabot	e783f1c191	feat: make embedding support list of string as input makes the /v1/embedding route similar to OpenAI api.	2023-05-20 01:23:32 +02:00
Andrei Betlen	a8cd169251	Bugfix: Stop sequences can be strings	2023-05-19 03:15:08 -04:00
Andrei Betlen	dc39cc0fa4	Use server sent events function for streaming completion	2023-05-19 02:04:30 -04:00
Andrei Betlen	a3352923c7	Add model_alias option to override model_path in completions. Closes #39	2023-05-16 17:22:00 -04:00
Andrei Betlen	cdf59768f5	Update llama.cpp	2023-05-14 00:04:22 -04:00
Andrei Betlen	8740ddc58e	Only support generating one prompt at a time.	2023-05-12 07:21:46 -04:00
Andrei Betlen	8895b9002a	Revert "llama_cpp server: prompt is a string". Closes #187 This reverts commit `b9098b0ef7`.	2023-05-12 07:16:57 -04:00
Lucas Doyle	02e8a018ae	llama_cpp server: document presence_penalty and frequency_penalty, mark as supported	2023-05-09 16:25:00 -07:00
Andrei Betlen	82d138fe54	Fix: default repeat_penalty	2023-05-08 18:49:11 -04:00
Andrei Betlen	0d751a69a7	Set repeat_penalty to 0 by default	2023-05-08 01:50:43 -04:00
Andrei Betlen	65d9cc050c	Add openai frequency and presence penalty parameters. Closes #169	2023-05-08 01:30:18 -04:00
Andrei Betlen	a0b61ea2a7	Bugfix for models endpoint	2023-05-07 20:17:52 -04:00
Andrei Betlen	14da46f16e	Added cache size to settins object.	2023-05-07 19:33:17 -04:00
Andrei Betlen	627811ea83	Add verbose flag to server	2023-05-07 05:09:10 -04:00
Andrei Betlen	3fbda71790	Fix mlock_supported and mmap_supported return type	2023-05-07 03:04:22 -04:00
Andrei Betlen	5a3413eee3	Update cpu_count	2023-05-07 03:03:57 -04:00
Andrei Betlen	1a00e452ea	Update settings fields and defaults	2023-05-07 02:52:20 -04:00
Andrei Betlen	86753976c4	Revert "llama_cpp server: delete some ignored / unused parameters" This reverts commit `b47b9549d5`.	2023-05-07 02:02:34 -04:00
Andrei Betlen	c382d8f86a	Revert "llama_cpp server: mark model as required" This reverts commit `e40fcb0575`.	2023-05-07 02:00:22 -04:00
Lucas Doyle	b9098b0ef7	llama_cpp server: prompt is a string Not sure why this union type was here but taking a look at llama.py, prompt is only ever processed as a string for completion This was breaking types when generating an openapi client	2023-05-02 14:47:07 -07:00
Andrei	7ab08b8d10	Merge branch 'main' into better-server-params-and-fields	2023-05-01 22:45:57 -04:00
Andrei Betlen	9eafc4c49a	Refactor server to use factory	2023-05-01 22:38:46 -04:00
Lucas Doyle	dbbfc4ba2f	llama_cpp server: fix to ChatCompletionRequestMessage When I generate a client, it breaks because it fails to process the schema of ChatCompletionRequestMessage These fix that: - I think `Union[Literal["user"], Literal["channel"], ...]` is the same as Literal["user", "channel", ...] - Turns out default value `Literal["user"]` isn't JSON serializable, so replace with "user"	2023-05-01 15:38:19 -07:00
Lucas Doyle	fa2a61e065	llama_cpp server: fields for the embedding endpoint	2023-05-01 15:38:19 -07:00
Lucas Doyle	8dcbf65a45	llama_cpp server: define fields for chat completions Slight refactor for common fields shared between completion and chat completion	2023-05-01 15:38:19 -07:00
Lucas Doyle	978b6daf93	llama_cpp server: add some more information to fields for completions	2023-05-01 15:38:19 -07:00
Lucas Doyle	a5aa6c1478	llama_cpp server: add missing top_k param to CreateChatCompletionRequest `llama.create_chat_completion` definitely has a `top_k` argument, but its missing from `CreateChatCompletionRequest`. decision: add it	2023-05-01 15:38:19 -07:00
Lucas Doyle	1e42913599	llama_cpp server: move logprobs to supported I think this is actually supported (its in the arguments of `LLama.__call__`, which is how the completion is invoked). decision: mark as supported	2023-05-01 15:38:19 -07:00
Lucas Doyle	b47b9549d5	llama_cpp server: delete some ignored / unused parameters `n`, `presence_penalty`, `frequency_penalty`, `best_of`, `logit_bias`, `user`: not supported, excluded from the calls into llama. decision: delete it	2023-05-01 15:38:19 -07:00
Lucas Doyle	e40fcb0575	llama_cpp server: mark model as required `model` is ignored, but currently marked "optional"... on the one hand could mark "required" to make it explicit in case the server supports multiple llama's at the same time, but also could delete it since its ignored. decision: mark it required for the sake of openai api compatibility. I think out of all parameters, `model` is probably the most important one for people to keep using even if its ignored for now.	2023-05-01 15:38:19 -07:00
Andrei Betlen	9ff9cdd7fc	Fix import error	2023-05-01 15:11:15 -04:00
Lucas Doyle	efe8e6f879	llama_cpp server: slight refactor to init_llama function Define an init_llama function that starts llama with supplied settings instead of just doing it in the global context of app.py This allows the test to be less brittle by not needing to mess with os.environ, then importing the app	2023-04-29 11:42:23 -07:00
Lucas Doyle	6d8db9d017	tests: simple test for server module	2023-04-29 11:42:20 -07:00
Lucas Doyle	468377b0e2	llama_cpp server: app is now importable, still runnable as a module	2023-04-29 11:41:25 -07:00

1 2

100 commits