Commit graph

655 commits

Author SHA1 Message Date
Andrei Betlen
d78cec67df Update llama.cpp 2023-05-04 12:20:25 -04:00
Andrei Betlen
329297fafb Bugfix: Missing logits_to_logprobs 2023-05-04 12:18:40 -04:00
Andrei Betlen
d594892fd4 Remove Docker CUDA build job 2023-05-04 00:02:46 -04:00
Andrei Betlen
0607f6578e Use network installer for cuda 2023-05-03 23:22:16 -04:00
Andrei Betlen
6d3c20e39d Add CUDA docker image build to github actions 2023-05-03 22:20:53 -04:00
Lucas Doyle
3008a954c1 Merge branch 'main' of github.com:abetlen/llama-cpp-python into better-server-params-and-fields 2023-05-03 13:10:03 -07:00
Andrei Betlen
a02aa121da Remove cuda build job 2023-05-03 10:50:48 -04:00
Andrei Betlen
07a56dd9c2 Update job name 2023-05-03 10:39:39 -04:00
Andrei Betlen
7839eb14d3 Add docker cuda image. Closes #143 2023-05-03 10:29:05 -04:00
Andrei Betlen
9e5b6d675a Improve logging messages 2023-05-03 10:28:10 -04:00
Andrei Betlen
43f2907e3a Support smaller state sizes 2023-05-03 09:33:50 -04:00
Andrei Betlen
1d47cce222 Update llama.cpp 2023-05-03 09:33:30 -04:00
Lucas Doyle
b9098b0ef7 llama_cpp server: prompt is a string
Not sure why this union type was here but taking a look at llama.py, prompt is only ever processed as a string for completion

This was breaking types when generating an openapi client
2023-05-02 14:47:07 -07:00
Lucas Doyle
0fcc25cdac examples fastapi_server: deprecate
This commit "deprecates" the example fastapi server by remaining runnable but pointing folks at the module if they want to learn more.

Rationale:

Currently there exist two server implementations in this repo:

- `llama_cpp/server/__main__.py`, the module that's runnable by consumers of the library with `python3 -m llama_cpp.server`
- `examples/high_level_api/fastapi_server.py`, which is probably a copy-pasted example by folks hacking around

IMO this is confusing. As a new user of the library I see they've both been updated relatively recently but looking side-by-side there's a diff.

The one in the module seems better:
- supports logits_all
- supports use_mmap
- has experimental cache support (with some mutex thing going on)
- some stuff with streaming support was moved around more recently than fastapi_server.py
2023-05-01 22:34:23 -07:00
Andrei Betlen
c2e31eecee Update permissions 2023-05-02 01:23:17 -04:00
Andrei Betlen
63f8d3a6fb Update context 2023-05-02 01:16:44 -04:00
Andrei Betlen
c21a34506e Update permsissions 2023-05-02 01:13:43 -04:00
Andrei Betlen
872b2ec33f Clone submodules 2023-05-02 01:11:34 -04:00
Andrei Betlen
62de4692f2 Fix missing dependency 2023-05-02 01:09:27 -04:00
Andrei
25062cecd3
Merge pull request #140 from abetlen/Niek/main
Add Dockerfile
2023-05-02 01:06:00 -04:00
Andrei Betlen
36c81489e7 Remove docker section of publish 2023-05-02 01:04:36 -04:00
Andrei Betlen
5d5421b29d Add build docker 2023-05-02 01:04:02 -04:00
Andrei Betlen
81631afc48 Install from local directory 2023-05-02 00:55:51 -04:00
Andrei Betlen
d605408f99 Add dockerignore 2023-05-02 00:55:34 -04:00
Andrei
e644e75915
Merge pull request #139 from matthoffner/patch-1
Fix FTYPE typo
2023-05-02 00:33:45 -04:00
Matt Hoffner
f97ff3c5bb
Update llama_cpp.py 2023-05-01 20:40:06 -07:00
Andrei Betlen
e9e0654aed Bump version 2023-05-01 22:52:25 -04:00
Andrei
7ab08b8d10
Merge branch 'main' into better-server-params-and-fields 2023-05-01 22:45:57 -04:00
Andrei Betlen
46e3c4b84a Fix 2023-05-01 22:41:54 -04:00
Andrei Betlen
9eafc4c49a Refactor server to use factory 2023-05-01 22:38:46 -04:00
Andrei Betlen
dd9ad1c759 Formatting 2023-05-01 21:51:16 -04:00
Lucas Doyle
dbbfc4ba2f llama_cpp server: fix to ChatCompletionRequestMessage
When I generate a client, it breaks because it fails to process the schema of ChatCompletionRequestMessage

These fix that:
- I think `Union[Literal["user"], Literal["channel"], ...]` is the same as Literal["user", "channel", ...]
- Turns out default value `Literal["user"]` isn't JSON serializable, so replace with "user"
2023-05-01 15:38:19 -07:00
Lucas Doyle
fa2a61e065 llama_cpp server: fields for the embedding endpoint 2023-05-01 15:38:19 -07:00
Lucas Doyle
8dcbf65a45 llama_cpp server: define fields for chat completions
Slight refactor for common fields shared between completion and chat completion
2023-05-01 15:38:19 -07:00
Lucas Doyle
978b6daf93 llama_cpp server: add some more information to fields for completions 2023-05-01 15:38:19 -07:00
Lucas Doyle
a5aa6c1478 llama_cpp server: add missing top_k param to CreateChatCompletionRequest
`llama.create_chat_completion` definitely has a `top_k` argument, but its missing from `CreateChatCompletionRequest`. decision: add it
2023-05-01 15:38:19 -07:00
Lucas Doyle
1e42913599 llama_cpp server: move logprobs to supported
I think this is actually supported (its in the arguments of `LLama.__call__`, which is how the completion is invoked). decision: mark as supported
2023-05-01 15:38:19 -07:00
Lucas Doyle
b47b9549d5 llama_cpp server: delete some ignored / unused parameters
`n`, `presence_penalty`, `frequency_penalty`, `best_of`, `logit_bias`, `user`: not supported, excluded from the calls into llama. decision: delete it
2023-05-01 15:38:19 -07:00
Lucas Doyle
e40fcb0575 llama_cpp server: mark model as required
`model` is ignored, but currently marked "optional"... on the one hand could mark "required" to make it explicit in case the server supports multiple llama's at the same time, but also could delete it since its ignored. decision: mark it required for the sake of openai api compatibility.

I think out of all parameters, `model` is probably the most important one for people to keep using even if its ignored for now.
2023-05-01 15:38:19 -07:00
Andrei Betlen
9d60ae56f2 Fix whitespace 2023-05-01 18:07:45 -04:00
Andrei Betlen
53c0129eb6 Update submoduele clone instructions 2023-05-01 18:07:15 -04:00
Andrei Betlen
b6747f722e Fix logprob calculation. Fixes #134 2023-05-01 17:45:08 -04:00
Andrei Betlen
c088a2b3a7 Un-skip tests 2023-05-01 15:46:03 -04:00
Andrei Betlen
bf3d0dcb2c Fix tests 2023-05-01 15:28:46 -04:00
Andrei Betlen
5034bbf499 Bump version 2023-05-01 15:23:59 -04:00
Andrei Betlen
f073ef0571 Update llama.cpp 2023-05-01 15:23:01 -04:00
Andrei Betlen
9ff9cdd7fc Fix import error 2023-05-01 15:11:15 -04:00
Andrei Betlen
2f8a3adaa4 Temporarily skip sampling tests. 2023-05-01 15:01:49 -04:00
Andrei Betlen
dbe0ad86c8 Update test dependencies 2023-05-01 14:50:01 -04:00
Andrei Betlen
350a1769e1 Update sampling api 2023-05-01 14:47:55 -04:00