Andrei Betlen
07a56dd9c2
Update job name
2023-05-03 10:39:39 -04:00
Andrei Betlen
7839eb14d3
Add docker cuda image. Closes #143
2023-05-03 10:29:05 -04:00
Andrei Betlen
9e5b6d675a
Improve logging messages
2023-05-03 10:28:10 -04:00
Andrei Betlen
43f2907e3a
Support smaller state sizes
2023-05-03 09:33:50 -04:00
Andrei Betlen
1d47cce222
Update llama.cpp
2023-05-03 09:33:30 -04:00
Lucas Doyle
b9098b0ef7
llama_cpp server: prompt is a string
...
Not sure why this union type was here but taking a look at llama.py, prompt is only ever processed as a string for completion
This was breaking types when generating an openapi client
2023-05-02 14:47:07 -07:00
Andrei Betlen
c2e31eecee
Update permissions
2023-05-02 01:23:17 -04:00
Andrei Betlen
63f8d3a6fb
Update context
2023-05-02 01:16:44 -04:00
Andrei Betlen
c21a34506e
Update permsissions
2023-05-02 01:13:43 -04:00
Andrei Betlen
872b2ec33f
Clone submodules
2023-05-02 01:11:34 -04:00
Andrei Betlen
62de4692f2
Fix missing dependency
2023-05-02 01:09:27 -04:00
Andrei
25062cecd3
Merge pull request #140 from abetlen/Niek/main
...
Add Dockerfile
2023-05-02 01:06:00 -04:00
Andrei Betlen
36c81489e7
Remove docker section of publish
2023-05-02 01:04:36 -04:00
Andrei Betlen
5d5421b29d
Add build docker
2023-05-02 01:04:02 -04:00
Andrei Betlen
81631afc48
Install from local directory
2023-05-02 00:55:51 -04:00
Andrei Betlen
d605408f99
Add dockerignore
2023-05-02 00:55:34 -04:00
Andrei
e644e75915
Merge pull request #139 from matthoffner/patch-1
...
Fix FTYPE typo
2023-05-02 00:33:45 -04:00
Matt Hoffner
f97ff3c5bb
Update llama_cpp.py
2023-05-01 20:40:06 -07:00
Andrei Betlen
e9e0654aed
Bump version
2023-05-01 22:52:25 -04:00
Andrei
7ab08b8d10
Merge branch 'main' into better-server-params-and-fields
2023-05-01 22:45:57 -04:00
Andrei Betlen
46e3c4b84a
Fix
2023-05-01 22:41:54 -04:00
Andrei Betlen
9eafc4c49a
Refactor server to use factory
2023-05-01 22:38:46 -04:00
Andrei Betlen
dd9ad1c759
Formatting
2023-05-01 21:51:16 -04:00
Lucas Doyle
dbbfc4ba2f
llama_cpp server: fix to ChatCompletionRequestMessage
...
When I generate a client, it breaks because it fails to process the schema of ChatCompletionRequestMessage
These fix that:
- I think `Union[Literal["user"], Literal["channel"], ...]` is the same as Literal["user", "channel", ...]
- Turns out default value `Literal["user"]` isn't JSON serializable, so replace with "user"
2023-05-01 15:38:19 -07:00
Lucas Doyle
fa2a61e065
llama_cpp server: fields for the embedding endpoint
2023-05-01 15:38:19 -07:00
Lucas Doyle
8dcbf65a45
llama_cpp server: define fields for chat completions
...
Slight refactor for common fields shared between completion and chat completion
2023-05-01 15:38:19 -07:00
Lucas Doyle
978b6daf93
llama_cpp server: add some more information to fields for completions
2023-05-01 15:38:19 -07:00
Lucas Doyle
a5aa6c1478
llama_cpp server: add missing top_k param to CreateChatCompletionRequest
...
`llama.create_chat_completion` definitely has a `top_k` argument, but its missing from `CreateChatCompletionRequest`. decision: add it
2023-05-01 15:38:19 -07:00
Lucas Doyle
1e42913599
llama_cpp server: move logprobs to supported
...
I think this is actually supported (its in the arguments of `LLama.__call__`, which is how the completion is invoked). decision: mark as supported
2023-05-01 15:38:19 -07:00
Lucas Doyle
b47b9549d5
llama_cpp server: delete some ignored / unused parameters
...
`n`, `presence_penalty`, `frequency_penalty`, `best_of`, `logit_bias`, `user`: not supported, excluded from the calls into llama. decision: delete it
2023-05-01 15:38:19 -07:00
Lucas Doyle
e40fcb0575
llama_cpp server: mark model as required
...
`model` is ignored, but currently marked "optional"... on the one hand could mark "required" to make it explicit in case the server supports multiple llama's at the same time, but also could delete it since its ignored. decision: mark it required for the sake of openai api compatibility.
I think out of all parameters, `model` is probably the most important one for people to keep using even if its ignored for now.
2023-05-01 15:38:19 -07:00
Andrei Betlen
9d60ae56f2
Fix whitespace
2023-05-01 18:07:45 -04:00
Andrei Betlen
53c0129eb6
Update submoduele clone instructions
2023-05-01 18:07:15 -04:00
Andrei Betlen
b6747f722e
Fix logprob calculation. Fixes #134
2023-05-01 17:45:08 -04:00
Andrei Betlen
c088a2b3a7
Un-skip tests
2023-05-01 15:46:03 -04:00
Andrei Betlen
bf3d0dcb2c
Fix tests
2023-05-01 15:28:46 -04:00
Andrei Betlen
5034bbf499
Bump version
2023-05-01 15:23:59 -04:00
Andrei Betlen
f073ef0571
Update llama.cpp
2023-05-01 15:23:01 -04:00
Andrei Betlen
9ff9cdd7fc
Fix import error
2023-05-01 15:11:15 -04:00
Andrei Betlen
2f8a3adaa4
Temporarily skip sampling tests.
2023-05-01 15:01:49 -04:00
Andrei Betlen
dbe0ad86c8
Update test dependencies
2023-05-01 14:50:01 -04:00
Andrei Betlen
350a1769e1
Update sampling api
2023-05-01 14:47:55 -04:00
Andrei Betlen
7837c3fdc7
Fix return types and import comments
2023-05-01 14:02:06 -04:00
Andrei Betlen
55d6308537
Fix test dependencies
2023-05-01 11:39:18 -04:00
Andrei Betlen
ccf1ed54ae
Merge branch 'main' of github.com:abetlen/llama_cpp_python into main
2023-05-01 11:35:14 -04:00
Andrei
79ba9ed98d
Merge pull request #125 from Stonelinks/app-server-module-importable
...
Make app server module importable
2023-05-01 11:31:08 -04:00
Andrei Betlen
80184a286c
Update llama.cpp
2023-05-01 10:44:28 -04:00
Lucas Doyle
efe8e6f879
llama_cpp server: slight refactor to init_llama function
...
Define an init_llama function that starts llama with supplied settings instead of just doing it in the global context of app.py
This allows the test to be less brittle by not needing to mess with os.environ, then importing the app
2023-04-29 11:42:23 -07:00
Lucas Doyle
6d8db9d017
tests: simple test for server module
2023-04-29 11:42:20 -07:00
Lucas Doyle
468377b0e2
llama_cpp server: app is now importable, still runnable as a module
2023-04-29 11:41:25 -07:00