baalajimaestro/llama.cpp

Author	SHA1	Message	Date
Lucas Doyle	0fcc25cdac	examples fastapi_server: deprecate This commit "deprecates" the example fastapi server by remaining runnable but pointing folks at the module if they want to learn more. Rationale: Currently there exist two server implementations in this repo: - `llama_cpp/server/__main__.py`, the module that's runnable by consumers of the library with `python3 -m llama_cpp.server` - `examples/high_level_api/fastapi_server.py`, which is probably a copy-pasted example by folks hacking around IMO this is confusing. As a new user of the library I see they've both been updated relatively recently but looking side-by-side there's a diff. The one in the module seems better: - supports logits_all - supports use_mmap - has experimental cache support (with some mutex thing going on) - some stuff with streaming support was moved around more recently than fastapi_server.py	2023-05-01 22:34:23 -07:00
Andrei Betlen	c2e31eecee	Update permissions	2023-05-02 01:23:17 -04:00
Andrei Betlen	63f8d3a6fb	Update context	2023-05-02 01:16:44 -04:00
Andrei Betlen	c21a34506e	Update permsissions	2023-05-02 01:13:43 -04:00
Andrei Betlen	872b2ec33f	Clone submodules	2023-05-02 01:11:34 -04:00
Andrei Betlen	62de4692f2	Fix missing dependency	2023-05-02 01:09:27 -04:00
Andrei	25062cecd3	Merge pull request #140 from abetlen/Niek/main Add Dockerfile	2023-05-02 01:06:00 -04:00
Andrei Betlen	36c81489e7	Remove docker section of publish	2023-05-02 01:04:36 -04:00
Andrei Betlen	5d5421b29d	Add build docker	2023-05-02 01:04:02 -04:00
Andrei Betlen	81631afc48	Install from local directory	2023-05-02 00:55:51 -04:00
Andrei Betlen	d605408f99	Add dockerignore	2023-05-02 00:55:34 -04:00
Andrei	e644e75915	Merge pull request #139 from matthoffner/patch-1 Fix FTYPE typo	2023-05-02 00:33:45 -04:00
Matt Hoffner	f97ff3c5bb	Update llama_cpp.py	2023-05-01 20:40:06 -07:00
Andrei Betlen	e9e0654aed	Bump version	2023-05-01 22:52:25 -04:00
Andrei	7ab08b8d10	Merge branch 'main' into better-server-params-and-fields	2023-05-01 22:45:57 -04:00
Andrei Betlen	46e3c4b84a	Fix	2023-05-01 22:41:54 -04:00
Andrei Betlen	9eafc4c49a	Refactor server to use factory	2023-05-01 22:38:46 -04:00
Andrei Betlen	dd9ad1c759	Formatting	2023-05-01 21:51:16 -04:00
Lucas Doyle	dbbfc4ba2f	llama_cpp server: fix to ChatCompletionRequestMessage When I generate a client, it breaks because it fails to process the schema of ChatCompletionRequestMessage These fix that: - I think `Union[Literal["user"], Literal["channel"], ...]` is the same as Literal["user", "channel", ...] - Turns out default value `Literal["user"]` isn't JSON serializable, so replace with "user"	2023-05-01 15:38:19 -07:00
Lucas Doyle	fa2a61e065	llama_cpp server: fields for the embedding endpoint	2023-05-01 15:38:19 -07:00
Lucas Doyle	8dcbf65a45	llama_cpp server: define fields for chat completions Slight refactor for common fields shared between completion and chat completion	2023-05-01 15:38:19 -07:00
Lucas Doyle	978b6daf93	llama_cpp server: add some more information to fields for completions	2023-05-01 15:38:19 -07:00
Lucas Doyle	a5aa6c1478	llama_cpp server: add missing top_k param to CreateChatCompletionRequest `llama.create_chat_completion` definitely has a `top_k` argument, but its missing from `CreateChatCompletionRequest`. decision: add it	2023-05-01 15:38:19 -07:00
Lucas Doyle	1e42913599	llama_cpp server: move logprobs to supported I think this is actually supported (its in the arguments of `LLama.__call__`, which is how the completion is invoked). decision: mark as supported	2023-05-01 15:38:19 -07:00
Lucas Doyle	b47b9549d5	llama_cpp server: delete some ignored / unused parameters `n`, `presence_penalty`, `frequency_penalty`, `best_of`, `logit_bias`, `user`: not supported, excluded from the calls into llama. decision: delete it	2023-05-01 15:38:19 -07:00
Lucas Doyle	e40fcb0575	llama_cpp server: mark model as required `model` is ignored, but currently marked "optional"... on the one hand could mark "required" to make it explicit in case the server supports multiple llama's at the same time, but also could delete it since its ignored. decision: mark it required for the sake of openai api compatibility. I think out of all parameters, `model` is probably the most important one for people to keep using even if its ignored for now.	2023-05-01 15:38:19 -07:00
Andrei Betlen	9d60ae56f2	Fix whitespace	2023-05-01 18:07:45 -04:00
Andrei Betlen	53c0129eb6	Update submoduele clone instructions	2023-05-01 18:07:15 -04:00
Andrei Betlen	b6747f722e	Fix logprob calculation. Fixes #134	2023-05-01 17:45:08 -04:00
Andrei Betlen	c088a2b3a7	Un-skip tests	2023-05-01 15:46:03 -04:00
Andrei Betlen	bf3d0dcb2c	Fix tests	2023-05-01 15:28:46 -04:00
Andrei Betlen	5034bbf499	Bump version	2023-05-01 15:23:59 -04:00
Andrei Betlen	f073ef0571	Update llama.cpp	2023-05-01 15:23:01 -04:00
Andrei Betlen	9ff9cdd7fc	Fix import error	2023-05-01 15:11:15 -04:00
Andrei Betlen	2f8a3adaa4	Temporarily skip sampling tests.	2023-05-01 15:01:49 -04:00
Andrei Betlen	dbe0ad86c8	Update test dependencies	2023-05-01 14:50:01 -04:00
Andrei Betlen	350a1769e1	Update sampling api	2023-05-01 14:47:55 -04:00
Andrei Betlen	7837c3fdc7	Fix return types and import comments	2023-05-01 14:02:06 -04:00
Andrei Betlen	55d6308537	Fix test dependencies	2023-05-01 11:39:18 -04:00
Andrei Betlen	ccf1ed54ae	Merge branch 'main' of github.com:abetlen/llama_cpp_python into main	2023-05-01 11:35:14 -04:00
Andrei	79ba9ed98d	Merge pull request #125 from Stonelinks/app-server-module-importable Make app server module importable	2023-05-01 11:31:08 -04:00
Andrei Betlen	80184a286c	Update llama.cpp	2023-05-01 10:44:28 -04:00
Lucas Doyle	efe8e6f879	llama_cpp server: slight refactor to init_llama function Define an init_llama function that starts llama with supplied settings instead of just doing it in the global context of app.py This allows the test to be less brittle by not needing to mess with os.environ, then importing the app	2023-04-29 11:42:23 -07:00
Lucas Doyle	6d8db9d017	tests: simple test for server module	2023-04-29 11:42:20 -07:00
Lucas Doyle	468377b0e2	llama_cpp server: app is now importable, still runnable as a module	2023-04-29 11:41:25 -07:00
Andrei	755f9fa455	Merge pull request #118 from SagsMug/main Fix UnicodeDecodeError permanently	2023-04-29 07:19:01 -04:00
Mug	18a0c10032	Remove excessive errors="ignore" and add utf8 test	2023-04-29 12:19:22 +02:00
Andrei Betlen	523825e91d	Update README	2023-04-28 17:12:03 -04:00
Andrei Betlen	e00beb13b5	Update README	2023-04-28 17:08:18 -04:00
Andrei Betlen	5423d047c7	Bump version	2023-04-28 15:33:08 -04:00

... 9 10 11 12 13 ...

843 commits