Andrei Betlen
5be0efa5f8
Cache should raise KeyError when key is missing
2023-05-05 12:21:49 -04:00
Andrei Betlen
24fc38754b
Add cli options to server. Closes #37
2023-05-05 12:08:28 -04:00
Thomas Neu
eb54e30f34
Update README.md
2023-05-05 14:22:41 +02:00
Thomas Neu
952ba9ecaf
Update README.md
...
add windows server commad
2023-05-05 14:21:57 +02:00
Andrei Betlen
5f583b0179
Merge branch 'main' of github.com:abetlen/llama_cpp_python into main
2023-05-04 21:59:40 -04:00
Andrei Betlen
5c165a85da
Bump version
2023-05-04 21:59:37 -04:00
Andrei Betlen
853dc711cc
Format
2023-05-04 21:58:36 -04:00
Andrei Betlen
97c6372350
Rewind model to longest prefix.
2023-05-04 21:58:27 -04:00
Andrei
38b8eeea58
Merge pull request #154 from th-neu/th-neu-dockerfile-slim
...
Slim-Bullseye based docker image
2023-05-04 19:59:23 -04:00
Thomas Neu
5672ed7fea
Merge branch 'abetlen:main' into th-neu-dockerfile-slim
2023-05-04 21:41:13 +02:00
Thomas Neu
501321875f
Slim-Bullseye based docker image
...
ends up at ~669MB
2023-05-04 21:03:19 +02:00
Andrei Betlen
cabd8b8ed1
Bump version
2023-05-04 12:21:20 -04:00
Andrei Betlen
d78cec67df
Update llama.cpp
2023-05-04 12:20:25 -04:00
Andrei Betlen
329297fafb
Bugfix: Missing logits_to_logprobs
2023-05-04 12:18:40 -04:00
Andrei Betlen
d594892fd4
Remove Docker CUDA build job
2023-05-04 00:02:46 -04:00
Andrei Betlen
0607f6578e
Use network installer for cuda
2023-05-03 23:22:16 -04:00
Andrei Betlen
6d3c20e39d
Add CUDA docker image build to github actions
2023-05-03 22:20:53 -04:00
Lucas Doyle
3008a954c1
Merge branch 'main' of github.com:abetlen/llama-cpp-python into better-server-params-and-fields
2023-05-03 13:10:03 -07:00
Andrei Betlen
a02aa121da
Remove cuda build job
2023-05-03 10:50:48 -04:00
Andrei Betlen
07a56dd9c2
Update job name
2023-05-03 10:39:39 -04:00
Andrei Betlen
7839eb14d3
Add docker cuda image. Closes #143
2023-05-03 10:29:05 -04:00
Andrei Betlen
9e5b6d675a
Improve logging messages
2023-05-03 10:28:10 -04:00
Andrei Betlen
43f2907e3a
Support smaller state sizes
2023-05-03 09:33:50 -04:00
Andrei Betlen
1d47cce222
Update llama.cpp
2023-05-03 09:33:30 -04:00
Lucas Doyle
b9098b0ef7
llama_cpp server: prompt is a string
...
Not sure why this union type was here but taking a look at llama.py, prompt is only ever processed as a string for completion
This was breaking types when generating an openapi client
2023-05-02 14:47:07 -07:00
Andrei Betlen
c2e31eecee
Update permissions
2023-05-02 01:23:17 -04:00
Andrei Betlen
63f8d3a6fb
Update context
2023-05-02 01:16:44 -04:00
Andrei Betlen
c21a34506e
Update permsissions
2023-05-02 01:13:43 -04:00
Andrei Betlen
872b2ec33f
Clone submodules
2023-05-02 01:11:34 -04:00
Andrei Betlen
62de4692f2
Fix missing dependency
2023-05-02 01:09:27 -04:00
Andrei
25062cecd3
Merge pull request #140 from abetlen/Niek/main
...
Add Dockerfile
2023-05-02 01:06:00 -04:00
Andrei Betlen
36c81489e7
Remove docker section of publish
2023-05-02 01:04:36 -04:00
Andrei Betlen
5d5421b29d
Add build docker
2023-05-02 01:04:02 -04:00
Andrei Betlen
81631afc48
Install from local directory
2023-05-02 00:55:51 -04:00
Andrei Betlen
d605408f99
Add dockerignore
2023-05-02 00:55:34 -04:00
Andrei
e644e75915
Merge pull request #139 from matthoffner/patch-1
...
Fix FTYPE typo
2023-05-02 00:33:45 -04:00
Matt Hoffner
f97ff3c5bb
Update llama_cpp.py
2023-05-01 20:40:06 -07:00
Andrei Betlen
e9e0654aed
Bump version
2023-05-01 22:52:25 -04:00
Andrei
7ab08b8d10
Merge branch 'main' into better-server-params-and-fields
2023-05-01 22:45:57 -04:00
Andrei Betlen
46e3c4b84a
Fix
2023-05-01 22:41:54 -04:00
Andrei Betlen
9eafc4c49a
Refactor server to use factory
2023-05-01 22:38:46 -04:00
Andrei Betlen
dd9ad1c759
Formatting
2023-05-01 21:51:16 -04:00
Lucas Doyle
dbbfc4ba2f
llama_cpp server: fix to ChatCompletionRequestMessage
...
When I generate a client, it breaks because it fails to process the schema of ChatCompletionRequestMessage
These fix that:
- I think `Union[Literal["user"], Literal["channel"], ...]` is the same as Literal["user", "channel", ...]
- Turns out default value `Literal["user"]` isn't JSON serializable, so replace with "user"
2023-05-01 15:38:19 -07:00
Lucas Doyle
fa2a61e065
llama_cpp server: fields for the embedding endpoint
2023-05-01 15:38:19 -07:00
Lucas Doyle
8dcbf65a45
llama_cpp server: define fields for chat completions
...
Slight refactor for common fields shared between completion and chat completion
2023-05-01 15:38:19 -07:00
Lucas Doyle
978b6daf93
llama_cpp server: add some more information to fields for completions
2023-05-01 15:38:19 -07:00
Lucas Doyle
a5aa6c1478
llama_cpp server: add missing top_k param to CreateChatCompletionRequest
...
`llama.create_chat_completion` definitely has a `top_k` argument, but its missing from `CreateChatCompletionRequest`. decision: add it
2023-05-01 15:38:19 -07:00
Lucas Doyle
1e42913599
llama_cpp server: move logprobs to supported
...
I think this is actually supported (its in the arguments of `LLama.__call__`, which is how the completion is invoked). decision: mark as supported
2023-05-01 15:38:19 -07:00
Lucas Doyle
b47b9549d5
llama_cpp server: delete some ignored / unused parameters
...
`n`, `presence_penalty`, `frequency_penalty`, `best_of`, `logit_bias`, `user`: not supported, excluded from the calls into llama. decision: delete it
2023-05-01 15:38:19 -07:00
Lucas Doyle
e40fcb0575
llama_cpp server: mark model as required
...
`model` is ignored, but currently marked "optional"... on the one hand could mark "required" to make it explicit in case the server supports multiple llama's at the same time, but also could delete it since its ignored. decision: mark it required for the sake of openai api compatibility.
I think out of all parameters, `model` is probably the most important one for people to keep using even if its ignored for now.
2023-05-01 15:38:19 -07:00