baalajimaestro/llama.cpp

Author	SHA1	Message	Date
Andrei	f712a04f4e	Merge pull request #157 from th-neu/th-neu-readme-windows readme windows	2023-05-05 12:40:45 -04:00
Thomas Neu	22c3056b2a	Update README.md added MacOS	2023-05-05 18:40:00 +02:00
Andrei Betlen	b6a9a0b6ba	Add types for all low-level api functions	2023-05-05 12:22:27 -04:00
Andrei Betlen	5be0efa5f8	Cache should raise KeyError when key is missing	2023-05-05 12:21:49 -04:00
Andrei Betlen	24fc38754b	Add cli options to server. Closes #37	2023-05-05 12:08:28 -04:00
Thomas Neu	eb54e30f34	Update README.md	2023-05-05 14:22:41 +02:00
Thomas Neu	952ba9ecaf	Update README.md add windows server commad	2023-05-05 14:21:57 +02:00
Andrei Betlen	5f583b0179	Merge branch 'main' of github.com:abetlen/llama_cpp_python into main	2023-05-04 21:59:40 -04:00
Andrei Betlen	5c165a85da	Bump version	2023-05-04 21:59:37 -04:00
Andrei Betlen	853dc711cc	Format	2023-05-04 21:58:36 -04:00
Andrei Betlen	97c6372350	Rewind model to longest prefix.	2023-05-04 21:58:27 -04:00
Andrei	38b8eeea58	Merge pull request #154 from th-neu/th-neu-dockerfile-slim Slim-Bullseye based docker image	2023-05-04 19:59:23 -04:00
Thomas Neu	5672ed7fea	Merge branch 'abetlen:main' into th-neu-dockerfile-slim	2023-05-04 21:41:13 +02:00
Thomas Neu	501321875f	Slim-Bullseye based docker image ends up at ~669MB	2023-05-04 21:03:19 +02:00
Mug	0e9f227afd	Update low level examples	2023-05-04 18:33:08 +02:00
Andrei Betlen	cabd8b8ed1	Bump version	2023-05-04 12:21:20 -04:00
Andrei Betlen	d78cec67df	Update llama.cpp	2023-05-04 12:20:25 -04:00
Andrei Betlen	329297fafb	Bugfix: Missing logits_to_logprobs	2023-05-04 12:18:40 -04:00
Andrei Betlen	d594892fd4	Remove Docker CUDA build job	2023-05-04 00:02:46 -04:00
Andrei Betlen	0607f6578e	Use network installer for cuda	2023-05-03 23:22:16 -04:00
Andrei Betlen	6d3c20e39d	Add CUDA docker image build to github actions	2023-05-03 22:20:53 -04:00
Lucas Doyle	3008a954c1	Merge branch 'main' of github.com:abetlen/llama-cpp-python into better-server-params-and-fields	2023-05-03 13:10:03 -07:00
Andrei Betlen	a02aa121da	Remove cuda build job	2023-05-03 10:50:48 -04:00
Andrei Betlen	07a56dd9c2	Update job name	2023-05-03 10:39:39 -04:00
Andrei Betlen	7839eb14d3	Add docker cuda image. Closes #143	2023-05-03 10:29:05 -04:00
Andrei Betlen	9e5b6d675a	Improve logging messages	2023-05-03 10:28:10 -04:00
Andrei Betlen	43f2907e3a	Support smaller state sizes	2023-05-03 09:33:50 -04:00
Andrei Betlen	1d47cce222	Update llama.cpp	2023-05-03 09:33:30 -04:00
Lucas Doyle	b9098b0ef7	llama_cpp server: prompt is a string Not sure why this union type was here but taking a look at llama.py, prompt is only ever processed as a string for completion This was breaking types when generating an openapi client	2023-05-02 14:47:07 -07:00
Lucas Doyle	0fcc25cdac	examples fastapi_server: deprecate This commit "deprecates" the example fastapi server by remaining runnable but pointing folks at the module if they want to learn more. Rationale: Currently there exist two server implementations in this repo: - `llama_cpp/server/__main__.py`, the module that's runnable by consumers of the library with `python3 -m llama_cpp.server` - `examples/high_level_api/fastapi_server.py`, which is probably a copy-pasted example by folks hacking around IMO this is confusing. As a new user of the library I see they've both been updated relatively recently but looking side-by-side there's a diff. The one in the module seems better: - supports logits_all - supports use_mmap - has experimental cache support (with some mutex thing going on) - some stuff with streaming support was moved around more recently than fastapi_server.py	2023-05-01 22:34:23 -07:00
Andrei Betlen	c2e31eecee	Update permissions	2023-05-02 01:23:17 -04:00
Andrei Betlen	63f8d3a6fb	Update context	2023-05-02 01:16:44 -04:00
Andrei Betlen	c21a34506e	Update permsissions	2023-05-02 01:13:43 -04:00
Andrei Betlen	872b2ec33f	Clone submodules	2023-05-02 01:11:34 -04:00
Andrei Betlen	62de4692f2	Fix missing dependency	2023-05-02 01:09:27 -04:00
Andrei	25062cecd3	Merge pull request #140 from abetlen/Niek/main Add Dockerfile	2023-05-02 01:06:00 -04:00
Andrei Betlen	36c81489e7	Remove docker section of publish	2023-05-02 01:04:36 -04:00
Andrei Betlen	5d5421b29d	Add build docker	2023-05-02 01:04:02 -04:00
Andrei Betlen	81631afc48	Install from local directory	2023-05-02 00:55:51 -04:00
Andrei Betlen	d605408f99	Add dockerignore	2023-05-02 00:55:34 -04:00
Andrei	e644e75915	Merge pull request #139 from matthoffner/patch-1 Fix FTYPE typo	2023-05-02 00:33:45 -04:00
Matt Hoffner	f97ff3c5bb	Update llama_cpp.py	2023-05-01 20:40:06 -07:00
Andrei Betlen	e9e0654aed	Bump version	2023-05-01 22:52:25 -04:00
Andrei	7ab08b8d10	Merge branch 'main' into better-server-params-and-fields	2023-05-01 22:45:57 -04:00
Andrei Betlen	46e3c4b84a	Fix	2023-05-01 22:41:54 -04:00
Andrei Betlen	9eafc4c49a	Refactor server to use factory	2023-05-01 22:38:46 -04:00
Andrei Betlen	dd9ad1c759	Formatting	2023-05-01 21:51:16 -04:00
Lucas Doyle	dbbfc4ba2f	llama_cpp server: fix to ChatCompletionRequestMessage When I generate a client, it breaks because it fails to process the schema of ChatCompletionRequestMessage These fix that: - I think `Union[Literal["user"], Literal["channel"], ...]` is the same as Literal["user", "channel", ...] - Turns out default value `Literal["user"]` isn't JSON serializable, so replace with "user"	2023-05-01 15:38:19 -07:00
Lucas Doyle	fa2a61e065	llama_cpp server: fields for the embedding endpoint	2023-05-01 15:38:19 -07:00
Lucas Doyle	8dcbf65a45	llama_cpp server: define fields for chat completions Slight refactor for common fields shared between completion and chat completion	2023-05-01 15:38:19 -07:00

... 20 21 22 23 24 ...

1422 commits