Andrei
f712a04f4e
Merge pull request #157 from th-neu/th-neu-readme-windows
...
readme windows
2023-05-05 12:40:45 -04:00
Thomas Neu
22c3056b2a
Update README.md
...
added MacOS
2023-05-05 18:40:00 +02:00
Andrei Betlen
b6a9a0b6ba
Add types for all low-level api functions
2023-05-05 12:22:27 -04:00
Andrei Betlen
5be0efa5f8
Cache should raise KeyError when key is missing
2023-05-05 12:21:49 -04:00
Andrei Betlen
24fc38754b
Add cli options to server. Closes #37
2023-05-05 12:08:28 -04:00
Thomas Neu
eb54e30f34
Update README.md
2023-05-05 14:22:41 +02:00
Thomas Neu
952ba9ecaf
Update README.md
...
add windows server commad
2023-05-05 14:21:57 +02:00
Andrei Betlen
5f583b0179
Merge branch 'main' of github.com:abetlen/llama_cpp_python into main
2023-05-04 21:59:40 -04:00
Andrei Betlen
5c165a85da
Bump version
2023-05-04 21:59:37 -04:00
Andrei Betlen
853dc711cc
Format
2023-05-04 21:58:36 -04:00
Andrei Betlen
97c6372350
Rewind model to longest prefix.
2023-05-04 21:58:27 -04:00
Andrei
38b8eeea58
Merge pull request #154 from th-neu/th-neu-dockerfile-slim
...
Slim-Bullseye based docker image
2023-05-04 19:59:23 -04:00
Thomas Neu
5672ed7fea
Merge branch 'abetlen:main' into th-neu-dockerfile-slim
2023-05-04 21:41:13 +02:00
Thomas Neu
501321875f
Slim-Bullseye based docker image
...
ends up at ~669MB
2023-05-04 21:03:19 +02:00
Mug
0e9f227afd
Update low level examples
2023-05-04 18:33:08 +02:00
Andrei Betlen
cabd8b8ed1
Bump version
2023-05-04 12:21:20 -04:00
Andrei Betlen
d78cec67df
Update llama.cpp
2023-05-04 12:20:25 -04:00
Andrei Betlen
329297fafb
Bugfix: Missing logits_to_logprobs
2023-05-04 12:18:40 -04:00
Andrei Betlen
d594892fd4
Remove Docker CUDA build job
2023-05-04 00:02:46 -04:00
Andrei Betlen
0607f6578e
Use network installer for cuda
2023-05-03 23:22:16 -04:00
Andrei Betlen
6d3c20e39d
Add CUDA docker image build to github actions
2023-05-03 22:20:53 -04:00
Lucas Doyle
3008a954c1
Merge branch 'main' of github.com:abetlen/llama-cpp-python into better-server-params-and-fields
2023-05-03 13:10:03 -07:00
Andrei Betlen
a02aa121da
Remove cuda build job
2023-05-03 10:50:48 -04:00
Andrei Betlen
07a56dd9c2
Update job name
2023-05-03 10:39:39 -04:00
Andrei Betlen
7839eb14d3
Add docker cuda image. Closes #143
2023-05-03 10:29:05 -04:00
Andrei Betlen
9e5b6d675a
Improve logging messages
2023-05-03 10:28:10 -04:00
Andrei Betlen
43f2907e3a
Support smaller state sizes
2023-05-03 09:33:50 -04:00
Andrei Betlen
1d47cce222
Update llama.cpp
2023-05-03 09:33:30 -04:00
Lucas Doyle
b9098b0ef7
llama_cpp server: prompt is a string
...
Not sure why this union type was here but taking a look at llama.py, prompt is only ever processed as a string for completion
This was breaking types when generating an openapi client
2023-05-02 14:47:07 -07:00
Lucas Doyle
0fcc25cdac
examples fastapi_server: deprecate
...
This commit "deprecates" the example fastapi server by remaining runnable but pointing folks at the module if they want to learn more.
Rationale:
Currently there exist two server implementations in this repo:
- `llama_cpp/server/__main__.py`, the module that's runnable by consumers of the library with `python3 -m llama_cpp.server`
- `examples/high_level_api/fastapi_server.py`, which is probably a copy-pasted example by folks hacking around
IMO this is confusing. As a new user of the library I see they've both been updated relatively recently but looking side-by-side there's a diff.
The one in the module seems better:
- supports logits_all
- supports use_mmap
- has experimental cache support (with some mutex thing going on)
- some stuff with streaming support was moved around more recently than fastapi_server.py
2023-05-01 22:34:23 -07:00
Andrei Betlen
c2e31eecee
Update permissions
2023-05-02 01:23:17 -04:00
Andrei Betlen
63f8d3a6fb
Update context
2023-05-02 01:16:44 -04:00
Andrei Betlen
c21a34506e
Update permsissions
2023-05-02 01:13:43 -04:00
Andrei Betlen
872b2ec33f
Clone submodules
2023-05-02 01:11:34 -04:00
Andrei Betlen
62de4692f2
Fix missing dependency
2023-05-02 01:09:27 -04:00
Andrei
25062cecd3
Merge pull request #140 from abetlen/Niek/main
...
Add Dockerfile
2023-05-02 01:06:00 -04:00
Andrei Betlen
36c81489e7
Remove docker section of publish
2023-05-02 01:04:36 -04:00
Andrei Betlen
5d5421b29d
Add build docker
2023-05-02 01:04:02 -04:00
Andrei Betlen
81631afc48
Install from local directory
2023-05-02 00:55:51 -04:00
Andrei Betlen
d605408f99
Add dockerignore
2023-05-02 00:55:34 -04:00
Andrei
e644e75915
Merge pull request #139 from matthoffner/patch-1
...
Fix FTYPE typo
2023-05-02 00:33:45 -04:00
Matt Hoffner
f97ff3c5bb
Update llama_cpp.py
2023-05-01 20:40:06 -07:00
Andrei Betlen
e9e0654aed
Bump version
2023-05-01 22:52:25 -04:00
Andrei
7ab08b8d10
Merge branch 'main' into better-server-params-and-fields
2023-05-01 22:45:57 -04:00
Andrei Betlen
46e3c4b84a
Fix
2023-05-01 22:41:54 -04:00
Andrei Betlen
9eafc4c49a
Refactor server to use factory
2023-05-01 22:38:46 -04:00
Andrei Betlen
dd9ad1c759
Formatting
2023-05-01 21:51:16 -04:00
Lucas Doyle
dbbfc4ba2f
llama_cpp server: fix to ChatCompletionRequestMessage
...
When I generate a client, it breaks because it fails to process the schema of ChatCompletionRequestMessage
These fix that:
- I think `Union[Literal["user"], Literal["channel"], ...]` is the same as Literal["user", "channel", ...]
- Turns out default value `Literal["user"]` isn't JSON serializable, so replace with "user"
2023-05-01 15:38:19 -07:00
Lucas Doyle
fa2a61e065
llama_cpp server: fields for the embedding endpoint
2023-05-01 15:38:19 -07:00
Lucas Doyle
8dcbf65a45
llama_cpp server: define fields for chat completions
...
Slight refactor for common fields shared between completion and chat completion
2023-05-01 15:38:19 -07:00