baalajimaestro/llama.cpp

Author	SHA1	Message	Date
Thomas Neu	501321875f	Slim-Bullseye based docker image ends up at ~669MB	2023-05-04 21:03:19 +02:00
Mug	0e9f227afd	Update low level examples	2023-05-04 18:33:08 +02:00
Andrei Betlen	cabd8b8ed1	Bump version	2023-05-04 12:21:20 -04:00
Andrei Betlen	d78cec67df	Update llama.cpp	2023-05-04 12:20:25 -04:00
Andrei Betlen	329297fafb	Bugfix: Missing logits_to_logprobs	2023-05-04 12:18:40 -04:00
Andrei Betlen	d594892fd4	Remove Docker CUDA build job	2023-05-04 00:02:46 -04:00
Andrei Betlen	0607f6578e	Use network installer for cuda	2023-05-03 23:22:16 -04:00
Andrei Betlen	6d3c20e39d	Add CUDA docker image build to github actions	2023-05-03 22:20:53 -04:00
Lucas Doyle	3008a954c1	Merge branch 'main' of github.com:abetlen/llama-cpp-python into better-server-params-and-fields	2023-05-03 13:10:03 -07:00
Andrei Betlen	a02aa121da	Remove cuda build job	2023-05-03 10:50:48 -04:00
Andrei Betlen	07a56dd9c2	Update job name	2023-05-03 10:39:39 -04:00
Andrei Betlen	7839eb14d3	Add docker cuda image. Closes #143	2023-05-03 10:29:05 -04:00
Andrei Betlen	9e5b6d675a	Improve logging messages	2023-05-03 10:28:10 -04:00
Andrei Betlen	43f2907e3a	Support smaller state sizes	2023-05-03 09:33:50 -04:00
Andrei Betlen	1d47cce222	Update llama.cpp	2023-05-03 09:33:30 -04:00
Lucas Doyle	b9098b0ef7	llama_cpp server: prompt is a string Not sure why this union type was here but taking a look at llama.py, prompt is only ever processed as a string for completion This was breaking types when generating an openapi client	2023-05-02 14:47:07 -07:00
Lucas Doyle	0fcc25cdac	examples fastapi_server: deprecate This commit "deprecates" the example fastapi server by remaining runnable but pointing folks at the module if they want to learn more. Rationale: Currently there exist two server implementations in this repo: - `llama_cpp/server/__main__.py`, the module that's runnable by consumers of the library with `python3 -m llama_cpp.server` - `examples/high_level_api/fastapi_server.py`, which is probably a copy-pasted example by folks hacking around IMO this is confusing. As a new user of the library I see they've both been updated relatively recently but looking side-by-side there's a diff. The one in the module seems better: - supports logits_all - supports use_mmap - has experimental cache support (with some mutex thing going on) - some stuff with streaming support was moved around more recently than fastapi_server.py	2023-05-01 22:34:23 -07:00
Andrei Betlen	c2e31eecee	Update permissions	2023-05-02 01:23:17 -04:00
Andrei Betlen	63f8d3a6fb	Update context	2023-05-02 01:16:44 -04:00
Andrei Betlen	c21a34506e	Update permsissions	2023-05-02 01:13:43 -04:00
Andrei Betlen	872b2ec33f	Clone submodules	2023-05-02 01:11:34 -04:00
Andrei Betlen	62de4692f2	Fix missing dependency	2023-05-02 01:09:27 -04:00
Andrei	25062cecd3	Merge pull request #140 from abetlen/Niek/main Add Dockerfile	2023-05-02 01:06:00 -04:00
Andrei Betlen	36c81489e7	Remove docker section of publish	2023-05-02 01:04:36 -04:00
Andrei Betlen	5d5421b29d	Add build docker	2023-05-02 01:04:02 -04:00
Andrei Betlen	81631afc48	Install from local directory	2023-05-02 00:55:51 -04:00
Andrei Betlen	d605408f99	Add dockerignore	2023-05-02 00:55:34 -04:00
Andrei	e644e75915	Merge pull request #139 from matthoffner/patch-1 Fix FTYPE typo	2023-05-02 00:33:45 -04:00
Matt Hoffner	f97ff3c5bb	Update llama_cpp.py	2023-05-01 20:40:06 -07:00
Andrei Betlen	e9e0654aed	Bump version	2023-05-01 22:52:25 -04:00
Andrei	7ab08b8d10	Merge branch 'main' into better-server-params-and-fields	2023-05-01 22:45:57 -04:00
Andrei Betlen	46e3c4b84a	Fix	2023-05-01 22:41:54 -04:00
Andrei Betlen	9eafc4c49a	Refactor server to use factory	2023-05-01 22:38:46 -04:00
Andrei Betlen	dd9ad1c759	Formatting	2023-05-01 21:51:16 -04:00
Lucas Doyle	dbbfc4ba2f	llama_cpp server: fix to ChatCompletionRequestMessage When I generate a client, it breaks because it fails to process the schema of ChatCompletionRequestMessage These fix that: - I think `Union[Literal["user"], Literal["channel"], ...]` is the same as Literal["user", "channel", ...] - Turns out default value `Literal["user"]` isn't JSON serializable, so replace with "user"	2023-05-01 15:38:19 -07:00
Lucas Doyle	fa2a61e065	llama_cpp server: fields for the embedding endpoint	2023-05-01 15:38:19 -07:00
Lucas Doyle	8dcbf65a45	llama_cpp server: define fields for chat completions Slight refactor for common fields shared between completion and chat completion	2023-05-01 15:38:19 -07:00
Lucas Doyle	978b6daf93	llama_cpp server: add some more information to fields for completions	2023-05-01 15:38:19 -07:00
Lucas Doyle	a5aa6c1478	llama_cpp server: add missing top_k param to CreateChatCompletionRequest `llama.create_chat_completion` definitely has a `top_k` argument, but its missing from `CreateChatCompletionRequest`. decision: add it	2023-05-01 15:38:19 -07:00
Lucas Doyle	1e42913599	llama_cpp server: move logprobs to supported I think this is actually supported (its in the arguments of `LLama.__call__`, which is how the completion is invoked). decision: mark as supported	2023-05-01 15:38:19 -07:00
Lucas Doyle	b47b9549d5	llama_cpp server: delete some ignored / unused parameters `n`, `presence_penalty`, `frequency_penalty`, `best_of`, `logit_bias`, `user`: not supported, excluded from the calls into llama. decision: delete it	2023-05-01 15:38:19 -07:00
Lucas Doyle	e40fcb0575	llama_cpp server: mark model as required `model` is ignored, but currently marked "optional"... on the one hand could mark "required" to make it explicit in case the server supports multiple llama's at the same time, but also could delete it since its ignored. decision: mark it required for the sake of openai api compatibility. I think out of all parameters, `model` is probably the most important one for people to keep using even if its ignored for now.	2023-05-01 15:38:19 -07:00
Andrei Betlen	9d60ae56f2	Fix whitespace	2023-05-01 18:07:45 -04:00
Andrei Betlen	53c0129eb6	Update submoduele clone instructions	2023-05-01 18:07:15 -04:00
Andrei Betlen	b6747f722e	Fix logprob calculation. Fixes #134	2023-05-01 17:45:08 -04:00
Andrei Betlen	c088a2b3a7	Un-skip tests	2023-05-01 15:46:03 -04:00
Andrei Betlen	bf3d0dcb2c	Fix tests	2023-05-01 15:28:46 -04:00
Andrei Betlen	5034bbf499	Bump version	2023-05-01 15:23:59 -04:00
Andrei Betlen	f073ef0571	Update llama.cpp	2023-05-01 15:23:01 -04:00
Andrei Betlen	9ff9cdd7fc	Fix import error	2023-05-01 15:11:15 -04:00

... 17 18 19 20 21 ...

1259 commits